Statistics - (Variance|Dispersion|Mean Square) (MS)

About

The variance shows how widespread the individuals are from the average.

The variance is how much that the estimate varies around its average.

It's a measure of consistency. A very large variance means that the data were all over the place, while a small variance (relatively close to the average) means that the majority of the data are closed.

See:

Articles Related

Formula

<MATH> \begin{array}{rrl} Variance & = & \frac{\displaystyle \sum_{i=1}^{\href{sample_size}{N}}{(\href{raw_score}{X}_i- \href{mean}{\bar{X}})^2}}{\displaystyle \href{degree_of_freedom}{\text{Degree of Freedom}}} \\ & = & \frac{\displaystyle \sum_{i=1}^{\href{sample_size}{N}}{(\href{Deviation Score}{\text{Deviation Score}}_i)^2}}{\displaystyle \href{degree_of_freedom}{\text{Degree of Freedom}}} \\ & = & (\href{Standard_Deviation}{\text{Standard Deviation}})^2 \end{array} </MATH>

where:

<math>X</math> is a data point (raw score).
<math>\bar{X}</math> is the mean of the distribution (all data points)
<math>\text{Degree of Freedom}</math> is the degree of freedom which is:
- <math>N</math> (Sample Size) for descriptive statistics
- <math>N - 1</math> for inference statistics
Statistics - Deviation Score (for one observation)
Statistics - Standard Deviation (SD|s| |RMS width)

Addition

<MATH> Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y) </MATH> where:

cov = Statistics - Covariance

Computation

For each data point, calculate the deviation score (difference from the average)
Square this difference (because the original sum of all deviation score is zero) (to get rid of negative differences)
Calculate a sum of the squared differences
The final variance is the sum of squared differences divided by the degree of freedom
The degree of freedom is:
- <math>N</math> (Sample Size) for descriptive statistics
- <math>N - 1</math> for inference statistics

Python

units = [7, 10, 9, 4, 5, 6, 5, 6, 8, 4, 1, 6, 6]
  
def units_average(units):
    average = sum(units) / len(units)
    return average

def units_variance(units,average):
    diff = 0
    for unit in units:
        diff += (unit - average) ** 2
    return diff / len(units)

print units_variance(units, units_average(units))