Skip to main content

False-positive science

The Atlantic reports on the dynamics of yet another group of scientists coming to grips with having wasted time and resources chasing down a dead end. (link) It's a good read but long. Here is the gist of it: Almost 20 years ago, some researchers made a huge splash by claiming to have discovered the "depression gene". The one gene eventually engendered 450 publications, and when counting related genes, over 1,000 publications. A recent large-scale "validation" study is likely to bring down the entire cottage industry - the depression gene is found to have little explanatory power for depression after all. Gene data is an example of a type of Big Data. Big Data can be big in terms of the number of individuals in the dataset, or the number of measurements per individual. Two decades ago, the scale was attained by virtue of more measurements, not more individuals. The original study looked at about 300 or so individuals but each person's genome is vast. The basic analysis is to compare the average depressed individual versus the average not-depressed individual in the sample. The data analyst sifts through large numbers of genes to find one or a few that are highly correlated with having depression. This is a classic fishing expedition, because of the large number of candidate genes, and also because of the large number of ways to define depression. Such an analysis rides on top of a "model" of the world in which a single gene is responsible for depression. Over the years, the scientific community has discovered that this model is wrong. The new model assumes depression is indicated by a large set of genes each contributing a weak effect. This type of structure is very hard to elicit from the typical datasets of the past - those that have numerous measurements on few individuals. Nowadays, we have data on lots of individuals but the sourcing of the data and other problems pose formidable challenges. It's also not clear how to use a model that spreads the blame thinly around a large number of genes for treatment. Science is proceeding as it should - weak theories are overturned with more research. The article laments that it took 20 years to turn the tide, earlier warnings were ignored, the publish-or-perish culture in academia creates perverse incentives, retraction of scientific studies, etc. *** I recently wrote about the challenge of Big Data expanding the variety of measurements here. Also, in writing Numbersense (link), I was concerned that the explosion of data collection causes an avalanche of false-positive science.  

from Big Data, Plainly Spoken (aka Numbers Rule Your World) http://bit.ly/2Wf3unn
via IFTTT

Comments

Popular posts from this blog

Controlling legend appearance in ggplot2 with override.aes

[This article was first published on Very statisticious on Very statisticious , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In ggplot2 , aesthetics and their scale_*() functions change both the plot appearance and the plot legend appearance simultaneously. The override.aes argument in guide_legend() allows the user to change only the legend appearance without affecting the rest of the plot. This is useful for making the legend more readable or for creating certain types of combined legends. In this post I’ll first introduce override.aes with a basic example and then go through three additional plotting scenarios to how other instances where override.aes comes in handy. Table of Contents R packages Introducing override.aes Adding a guides() layer Using the guide argument in scale_*() Changing multiple aesthetic par...

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...