About
Maximum likelihood was introduced by Ronald Fisher back in the 1920s.
Since each observation is meant to be independent of each other one, the probability of observed data is the probability of the observed class (for a binary class: 0's and 1's)
<MATH> \begin{array}{rrl} likelihood(B_0,B_1) & = & \prod_{i:y_i=1}p(X_i) \prod_{i:y_i=0}(1-p(X_i)) \end{array} </MATH>
where:
- <math>\displaystyle \prod_{i:y_i=1}</math> is the probability of i when y=1.
The idea of maximum likelihood is pick the parameters to make that probability as large as possible.
The probability is represented by a function called the “log-likelihood function”.