Table of Contents

Statistics - independent t-test

About

A dependent t-test is appropriate when you want to compare two independent samples. (two completely different groups of entities).

For examples:

This is one of the most common, statistical tests.

With several groups, we could do pair-wise comparisons via independent t-tests. It's tedious task and it increases the chance of a type one error because we do multiple t-tests on the same set of data. Therefore when they are more than two groups means to compare, the use of Analysis of Variance (Anova) is recommended.

Assumption

Normal distribution

The distribution is normal

Homogeneity of variance

There's an equivalent amount of variance in these two groups. This is a really important assumption underlying independent t-test but also analysis variance.

This assumption is needed because the standard deviations of the two groups are pooled (addition, substraction) in the below test statistics. The standard error estimate is then based on that pooled standard deviation.

So we have to test that and see if the variances are different across groups. A Levene's test can be use to detect whether this assumption is violated.

Mean Difference

We can calculate a mean difference between the two groups.

<MATH> \begin{array}{rrl} \text{Mean Difference} & = & \href{mean}{M}_1 - \href{mean}{M}_2 \\ \end{array} </MATH>

where:

Thorough analysis

A thorough analysis will include:

t-value

See t-value for mean.

The expected difference between the two groups under the null hypothesis is zero.

<MATH> \begin{array}{rrl} \href{t-value#mean}{\text{t-value}} & = & \frac{\href{#mean_difference}{\text{Mean Difference}}}{\href{Standard_Error}{\text{Standard Error of the mean difference}}} & \\ & = & (\href{mean}{M_1} - \href{mean}{M_2}) . \left(\frac{2}{ \href{Standard_Error}{SE_1} - \href{Standard_Error}{SE_2} } \right ) & \\ \end{array} </MATH>

p-value

The p-value will be based on:

Degree of freedom

<MATH> \href{degree_of_freedom}{df} = (\href{sample_size}{\text{Sample Size Group 1}} - 1) + (\href{sample_size}{\text{Sample Size Group 2}} - 1) </MATH>

Effect size

The most appropriate and the most common estimate of effect size is Cohen's d.

Cohen’s d:

Because NHST is biased by sample size, we should supplement the analysis with an estimate of effect size: Cohen's d

And the effect size is calculated differently than in regression.

Cohen's d is a intuitive measure that tells us how much in terms of standard deviation units:

<MATH> \begin{array}{rrl} \text{Cohen's d} & = & \frac{\href{#mean_difference}{\text{Mean Difference}}}{\href{Standard Deviation}{\text{Pooled Standard deviation}}} \\ & = & (\href{mean}{M_1} - \href{mean}{M_2}) \left ( \frac{2}{\href{Standard Deviation}{SD_1} + \href{Standard Deviation}{SD_2} } \right )\\ \end{array} </MATH>

Because we have two separate groups, we have to pool their variances or their standard. The pooled standard deviation is just the average of the two deviations.

As you can remark:

Why ? Because:

A Cohen's d of 1 means that:

0.8 is also a strong effect.

Confidence Interval

We can also get interval estimates around these means rather than just point estimates.

We get the Mean Difference (M1 - M2) and put an upper bound and a lower bound. It's the same method than for sample means or regression coefficients.

<MATH> \begin{array}{rrl} \text{Upper bound} & = & \href{#mean_difference}{\text{Mean Difference}} & + & \href{#t-value}{\text{t-value}}.\href{Standard_Error}{\text{Standard Error}} \\ \text{Lower bound} & = & \href{#mean_difference}{\text{Mean Difference}} & - & \href{#t-value}{\text{t-value}}.\href{Standard_Error}{\text{Standard Error}} \end{array} </MATH>

That exact t-value value depends on:

When the interval does not include zero, it's significant in terms of null hypothesis significance testing.