Must-read paper: how to make ANY statistical test come out "significant"
Submitted by drupaladmin on 16 February 2012.The linked paper is the best paper I've read in a long time. It's essential reading for everyone who does science, from undergraduates on up. It's about experimental psychology, but it applies just as much to ecology, perhaps even more so. It says something I've long believed, but says it far better than I ever could have.
One partial solution to the problems identified in this paper is for all of us to adhere a lot more strictly to the rules of good frequentist statistical practice that we all teach, or should teach, our undergraduates. Rules like "decide the experimental design, sampling procedure, and statistical analyses in advance", "don't chuck outliers just because they're 'outliers'", "separate exploratory and confirmatory analyses, for instance by dividing the data set in half", "correct for multiple comparisons", etc. Those rules exist for a very good reason: to keep us from fooling ourselves. This is not to say that judgment calls can ever be eliminated from statistics--indeed, another one of my favorite statistical papers makes precisely this point. But those judgments need to be grounded in a strong appreciation of the rules of good practice, so that the investigator can decide when or how to violate the rules without compromising the severity of the statistical test.
Basically, what I'm suggesting is that, collectively, our standards about when it's ok to violate the statistical "rules" may well be far too lax. Of course, if they were less lax, doing science would get a lot harder. Or rather, it would seem to get a lot harder. In fact, doing science that leads to correct, replicable conclusions would remain just as hard as it always has been. It would only seem to get harder because we'd stop taking the easy path of cutting statistical corners. And then justifying the corner cutting by making excuses to ourselves about the messiness of the real world and the impracticality of idealized textbook statistical practice.
The linked paper discusses another solution: to report all judgment calls and exploratory analyses, so that reviewers can evaluate their effects on the conclusions. Sounds like a great idea to me. They also note, correctly, that simply doing Bayesian stats is no solution at all. The paper is emphatically not a demonstration of inherent flaws in frequentist statistics.
Further commentary from Andrew Gelman here.