Checking hypotheses and the problem of p-values

Vast majority of scientific research begins with an idea how the world works according to the proposer. The proposer formulates his hypothesis and tries to prove it using scientific method, usually checking his experiments or observations using varying statistical tools. These tools are used to process the collected data and either confirm his initial hypothesis or to reject it in comparison to the alternatives.

One of the methods is the so-called critical value approach (e.g., see on Penn State Eberly College of Science website: This method relies on the researcher to set a precision standard to the statistical test and accept or reject hypothesis based on it. Usually different branches of science have their own set of rules how small the error could be tolerated. For example in life sciences it is common to see that most of published papers report statistical significance of \( p<0.05 \) (meaning that probability of error is less than \( 5\% \)), while in physics it is rather frequent to hear about the precision of \( 5 \sigma \) (probability of error is less than \( 5.7 \cdot 10^{-5} \% \)).

From the first glance it appears that the methods lacks drawbacks. But in the context of current science publishing tradition - mostly positive results being published - the drawbacks are evident. All statistical methods rely on numerous samples being made - so in order for these kind of test to work numerous independent groups should repeat the same experiment and obtain similar conclusion. Otherwise there is a significant possibility of a positive result being just a successful fluke. Having in mind pressure to publish more pressure there is also a risk that the same research group would repeat the same experiment until getting the desired statistical significance (waiting for a fluke to happen).

I did my best to enlighten you to this problem, but there is a rather significant chance that Hank Green will do better in this SciShow video I invite you to see.

For the ones who are more interested in technical detail I would like suggest reading a draft by Nicholas Nassim Taleb (see on Fooled by Randomness website: