Machine Learning - Logistic regression (Classification Algorithm)

About

The prediction from a logistic regression model can be interpreted as the probability that the label is 1.

linear regression can also be used to perform classification problem.

Just by transforming the categorical target with continuous values.

The idea of logistic regression is to make linear regression produce probabilities. It's always best to predict class probabilities instead of predicting classes.

Logistic regression estimate class probabilities directly using the logit transform.

The Linear regression calculate a linear function and then a threshold in order to classify.

The result is logistic regression, a popular classification technique.

Logistic regression can be framed as minimizing a convex function but has no closed-form solution.

Articles Related

Outcome

Binary

For a two binary classification problem

False = 0
True = 1

If the predicted score <math>\hat{Y}</math> > 0.5 then True.

In this case of a binary outcome, linear regression does a good job as a classifier, and is equivalent to linear discriminant analysis

See:

Machine Learning - (Univariate|Simple) Logistic regression (with one variables)
Statistics Learning - Multi-variant logistic regression (the generalization with more than one variable)

There's even some theoretical justification.

A regression is an estimation of the conditional mean of Y given X.
Therefore, the conditional mean of the 0, 1 variable given X is simply the probability that Y is 1 given X just by simple probability theory.

Multi-class

With a response variable with three possible values, we may choose this coding:

Blue=1
Green=2
Red=3

This coding suggests an ordering, and in fact implies that the difference (distance) between blue and green is the same as between green and red.

Linear regression is then not appropriate. Multiclass Logistic Regression or Discriminant Analysis are more appropriate.

We can't perform a regression for each class (like multi‐response regression) because probabilities won’t sum to 1. It can be done as a joint optimization problem.

Disadvantage

When the classes are well separated, it turns out that the parameter estimates for logistic regression are surprisingly unstable.

In fact, if you've go a feature that separates the classes perfectly, the coefficients go off to infinity.

Logistic aggression was developed in largely the biological and medical fields where you never found such strong predictors.

Now, you can do things to make logistic regression better behave. But it turns out linear discriminant analysis doesn't suffer from this problem and is better behaved in those situations.