Statistics - (Variance|Dispersion|Mean Square) (MS)
Table of Contents
About
The variance shows how widespread the individuals are from the average.
The variance is how much that the estimate varies around its average.
It's a measure of consistency. A very large variance means that the data were all over the place, while a small variance (relatively close to the average) means that the majority of the data are closed.
See:
Articles Related
Formula
<MATH> \begin{array}{rrl} Variance & = & \frac{\displaystyle \sum_{i=1}^{\href{sample_size}{N}}{(\href{raw_score}{X}_i- \href{mean}{\bar{X}})^2}}{\displaystyle \href{degree_of_freedom}{\text{Degree of Freedom}}} \\ & = & \frac{\displaystyle \sum_{i=1}^{\href{sample_size}{N}}{(\href{Deviation Score}{\text{Deviation Score}}_i)^2}}{\displaystyle \href{degree_of_freedom}{\text{Degree of Freedom}}} \\ & = & (\href{Standard_Deviation}{\text{Standard Deviation}})^2 \end{array} </MATH>
where:
- <math>X</math> is a data point (raw score).
- <math>\bar{X}</math> is the mean of the distribution (all data points)
- <math>\text{Degree of Freedom}</math> is the degree of freedom which is:
- <math>N</math> (Sample Size) for descriptive statistics
- <math>N - 1</math> for inference statistics
Addition
<MATH> Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y) </MATH> where:
- cov = Statistics - Covariance
Computation
- For each data point, calculate the deviation score (difference from the average)
- Square this difference (because the original sum of all deviation score is zero) (to get rid of negative differences)
- Calculate a sum of the squared differences
- The final variance is the sum of squared differences divided by the degree of freedom
- The degree of freedom is:
- <math>N</math> (Sample Size) for descriptive statistics
- <math>N - 1</math> for inference statistics
Python
units = [7, 10, 9, 4, 5, 6, 5, 6, 8, 4, 1, 6, 6]
def units_average(units):
average = sum(units) / len(units)
return average
def units_variance(units,average):
diff = 0
for unit in units:
diff += (unit - average) ** 2
return diff / len(units)
print units_variance(units, units_average(units))
5