Statistics - P-value

Card Puncher Data Processing


Fisher found with the a P-value a way of judging the “significance” of experimental data entirely objectively.

Fisher choose the talismanic figure 0.05 because it was mathematically convenient and thus without truly insights.

The p-value is:

  • the chance that I have to see a difference if I take two control groups.
  • NOT: The probability of the null hypothesis being true is p
  • the probability of obtaining the data you obtain given the assumption that:

When conducting a null hypothesis significance test, the p-value represents the probability of the data given the null hypothesis is true.

<MATH>p = P(D | H_0) <> P(H_0 | D ) </MATH>


  • D is the result of the study
  • <math>H_0</math> is the null hypothesis

The p-value is based on the t-value and its corresponding t distribution which depends on the sample size. The p-value is just the area under the curve at that particular t-value and beyond..

So if you:

  • are way out in the extremes, you're going to have a low p-value. You'll reject the NULL hypothesis. The parameter is not Null. There is a relationship.
  • fall around the middle, then you'll have a high p-value. You'll retain the NULL hypothesis. The parameter is Null. There is no relationship.

A p-value less than 0.5 means less than 5% of the distribution are:

  • in the extreme positive tail of the t-distribution
  • or in the extreme negative tail of the t-distribution.

p values above 0.05 or 0.1 are not significant. They're not enough evidence against the null hypothesis, which is that the coefficient is 0.

The idea was to run an experiment, then see if the results were consistent with what random chance might produce. Researchers would first set up a 'null hypothesis' that they wanted to disprove, such as there being no correlation or no difference between two groups. Next, they would play the devil's advocate and, assuming that this null hypothesis was in fact true, calculate the chances of getting results at least as extreme as what was actually observed. This probability was the P value. The smaller it was, suggested Fisher, the greater the likelihood that the straw-man null hypothesis was false.

Most scientists would look at his original P value of 0.01 and say that there was just a 1% chance of his result being a false alarm. But they would be wrong. The P value cannot say this: all it can do is summarize the data assuming a specific null hypothesis.

The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is.

A p value measures whether an observed result can be attributed to chance. But it cannot answer a researcher's real question: what are the odds that an hypothesis is correct ? Those odds depend on:

  • how strong the result was
  • and, most importantly, how plausible the hypothesis is in the first place.

P Value Pipeline


According to a threshold value, it tells you if the results are significant.

The cute-off for the p-value is 0.05.

On paper, you can read: “The relation ship was statistically significant, p less than 0.05”

P Value



To get a p-value, we first calculate a t-value.

If the test is two-sided:

  • P-value = 2 * P( X > |observed value| )

If the test is one-sided:

  • H_A is μ > μ0
    • P-value = P( X > observed value )
  • H_A is μ < μ0
    • P-value = P( X < observed value )


// Simulation parameters of 10000 sample of coin 
totalTrials = 10000;
totalAboveSample = 0;
// The sample has a size of 40 and 30 of it have a face
sampleSize = 40;
totalCoinSample = 30
// Simulation 
for i in range(totalTrials):
    // TotalCoinTrials is the total for a coin throws 40 times
    totalCoinTrials = randint(2, size=sampleSize )
	// Output of our sample
	if (totalCoinTrials.sum() >=totalCoinSample):
		totalAboveSample += 1 

probability = totalAboveSample/totalTrials

Documentation / Reference

Discover More
Weka Accuracy Metrics
Data Mining - (Parameters | Model) (Accuracy | Precision | Fit | Performance) Metrics

Accuracy is a evaluation metrics on how a model perform. rare event detection Hypothesis testing: t-statistic and p-value. The p value and t statistic measure how strong is the...
Card Puncher Data Processing
R - Correlation

with: confidence interval df (degree of freedom) t-value p-value cor (Pearson's product-moment correlation)
Thomas Bayes
Statistics - (F-Statistic|F-test|F-ratio)

The NHST anova statistic test is an F-test or F-ratio. It's what you observe in the numerator relative to what you would expect just due to chance in the denominator. The f statistic is the statistic...
Thomas Bayes
Statistics - (Interaction|Synergy) effect

In a multiple regression, is assumed that the effect on the target of increasing one unit of one predictor (is independent|has no influence) on the other predictor. If this is not the case, sharing a...
Thomas Bayes
Statistics - (Significance | Significant) Effect

Fisher found with the P-value a way of judging the “significance” of experimental data entirely objectively. effect
Thomas Bayes
Statistics - (Student's) t-test (Mean Comparison)

The t-test is a test that compares means. NHST can be conducted yielding to a p-value Effect Size can be calculated like in multiple regression. Confidence Interval around the mean can also be...
Univariate Linear Regression
Statistics - (Univariate|Simple|Basic) Linear Regression

A Simple Linear regression is a linear regression with only one predictor variable (X). Correlation demonstrates the relationship between two variables whereas a simple regression provides an equation...
Thomas Bayes
Statistics - (dependent|paired sample) t-test

A dependent t-test is appropriate when: we have the same people measured twice. the same subject are been compared (ex: Pre/Post Design) or two samples are matched at the level of individual subjects...
Card Puncher Data Processing
Statistics - (t-value|t-statistic)

The (t-value|t-statistic) is a test statistic. In NHST, it is essentially a ratio of what we observed relative to what we would expect just due to chance. Each t-value has corresponding p-value depending...
Card Puncher Data Processing
Statistics - Null Hypothesis Significance Testing (NHST)

NHST is a procedure for testing the Null Hypothesis. It's a binary decision: Reject or Retain the Null Hypothesis Ie Retain the Null Hypothesis or the alternative hypothesis. Before starting any...

Share this page:
Follow us:
Task Runner