Table of Contents

Statistical Learning - Simple Linear Discriminant Analysis (LDA)

About

Linear Discriminant Analysis with only one variable (p = 1).

For a generalization, see Statistics - Fisher (Multiple Linear Discriminant Analysis|multi-variant Gaussian)

Assumption

The variance <math>\sigma_k</math> from the distribution of the value <math>X_i</math> when <math>Y_i = k</math> is the same in each of the classes k.

It is an important convenience as it's going to determine whether the discriminant function that we get, the discriminant analysis, gives us linear functions or quadratic functions.

Model Construction

Gaussian density

The Gaussian density has the form:

<MATH> f_k(x) = \frac{1}{\sqrt{2\pi}\sigma_k} e^{\displaystyle -\frac{1}{2} \left (\frac{x-\mu_k}{\sigma_k} \right )^2} </MATH>

where:

Bayes Formula

Total

Plugging the gaussian density into the Bayes formula, we get a rather complex expression.

<MATH> \begin{array}{rrl} Pr(Y = k|X = x) & = & \frac{\displaystyle Pr(X = x|Y = k)  Pr(Y = k)}{\displaystyle Pr(X = x)} \\ p_k(x) & = & \frac {\displaystyle \pi_k \frac{1}{\sqrt{2\pi}\sigma_k} e^{\displaystyle -\frac{1}{2} \left (\frac{x-\mu_k}{\sigma_k} \right )^2}} {\displaystyle \sum^K_{l=1} \pi_l \frac{1}{\sqrt{2\pi}\sigma_k} e^{\displaystyle -\frac{1}{2} \left (\frac{x-\mu_l}{\sigma_k} \right )^2}} \end{array} </MATH>

Simplification

Luckily, thanks to the assumptions, there's some simplifications and cancellations.

To classify an observation to a class, we don't need to initially evaluate the probabilities. We just need to see which is the largest.

Whenever you see exponentials the first thing you want to do is take the logs.

And if you discard terms that do not depend on k, that amounts to doing a lot of cancellation of terms that don't count.

This is equivalent to assigning to the class with the largest discriminant score.

<MATH> \delta_k(x) = x.\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2}+log(\pi_k) </MATH>

It involves:

And importantly, <math>\delta_k(x)</math> is a linear function of x.

There's:

For each of the classes, we get one of those functions .

Binary

If:

, you can simplify even further and see that the decision boundary is at

<MATH> x = \frac{\mu_1 + \mu_2}{2} </MATH>

Parameters Estimation

<MATH> \hat{\pi_k} = \frac{\displaystyle N_k}{N} </MATH>

<MATH> \hat{\mu_k} = \frac{1}{N_k}\sum_{i:y_i=k}x_i </MATH>

The notation <math>\displaystyle \sum_{i:y_i=k}</math> will just sum the <math>x_i</math> 's that are in class k.

of the classes, this formula is called a pooled variance estimate. <MATH> \begin{array}{rrl} \hat{\sigma}^2 & = & \frac{1}{n-K}\sum_{k=1}^K\sum_{i:y_i=k}(x_i-\hat{\mu}_k)^2 \\ \end{array} </MATH> The formula:

A simplified version is: <MATH> \begin{array}{rrl} \hat{\sigma}^2 & = & \sum_{k=1}^K \frac{n_k-1}{n-K}.\hat{\sigma}^2_k \end{array} </MATH> where <math>\hat{\sigma}^2_k</math> is the usual formula for the estimated variance in the kth class ie: <MATH> \begin{array}{rrl} \hat{\sigma}^2_k & = & \frac{1}{n_k-1} \sum_{i:y_i=k} (x_i-\hat{\mu_k})^2 \end{array} </MATH>