In the American Statistician Association (2016a) statement, stated the following conversation:
Q: Why do so many colleges and grad schools teach p = 0.05?
A: Because that’s still what the scientific community and journal editors use.
Q: Why do so many peope still use p = 0.05?
A: Because that’s what they were taught incollege or grad school.
Someone doesn’t need to be studying philosophy, or for the Law School Acceptance Test (LSAT) to see the flaw in that argument. It’s circular reasoning, and that is the point. The p-value is being overused when there are so many other ways to measure the strength of the data and it’s significance. Plus, a p = 0.05 is arbitrary and dependent on many fields. I have seen papers use p = 0.10; p = 0.05, p = 0.01 and rarely p = 0.001. But, are the results reliable, replicable, and reproducible? There are even studies that manipulate their data to get these elusive p-values…
Scientific research is at the bedrock of pushing society forward. However, not every study’s results published can represent the best of science. Some in the field have tried to alter how long the study lasts, not take into account of a confounding variable that could be causing the results, make the sample size too small to be reliable and allowing luck to be in play, or attempt p-hacking (Adam Ruins Everything, 2017; CrashCourse, 2018; Oliver, 2016).
P-hacking is defined as gathering as many variables as possible, then massaging the huge amounts of data to get a statistically significant result (CrashCourse, 2018; Oliver, 2016). However, that result could be completely meaningless. Similar to when the 538 blog did a p-hacking study called “You can’t trust what you read about nutrition” surveyed 54 people and collected over 1000 variables, found a statistically significant correlation between eating raw tomatoes to Judaism. 538 did this study just to point out the issue of p-hacking (Aschwanden, 2016).
As mentioned earlier, the best way to protect ourselves from p-hacking is to replicate the study and see if we can get similar results to the original study (Adam Ruins Everything, 2017; John Olver, 2016). Unfortunately, in science, there is no prize for fact-checking (John Oliver, 2016). That is why when we do research, we must make sure our results are robust, by testing multiple times if possible. If it is not possible to do it in your own research, then a replication study is called for by others. However, Replication studies are rarely ever funded and rarely get published (Adam Ruins Everything, 2017). A great way to do this, is collaborating with scientific peers from multiple universities, work on the same problem, with the same methodology, but different datasets and publish one or a series of papers that confirms a result as replicable and robust. If we don’t do this, it forces the scientific field to only fund exploratory studies to get developed and published, and the results never get evaluated. Unfortunately, the adage for most scientists is to “publish or perish,” and as Prof. Brian Nosek from Center for Open Science said, “There is NO COST to getting things WRONG. THE COST is not getting them PUBLISHED.” (John Oliver, 2016).
The American Statistical Association (2016b), suggested the following to be used with p-values to give a more accurate representation of the significances:
- Methods that emphasize estimation over testing
- Confidence intervals
- Credibility intervals
- Prediction intervals
- Bayesian methods
- Alternatives measure of evidence
- Likelihood ratios
- Bayesian Factors
- Decision-Theoretic modeling
- False discovery rates
Have hope, most reputable scientists don’t take the result of one study to heart, but look at in the context of all the work done in that field (Adam Ruins Everything, 2017). Also, most reputable scientists tend to downplay the implications and generalizations of their results when they publish their findings (American Statistical Association, 2016b; Adam Ruins Everything, 2017; CrashCourse, 2018; Oliver, 2016). Looking for those kinds of studies and knowing how p-hacking is done is the best ammunition to defend against spurious results.