Loss functions (Incorrect predictions penalty)


Loss functions define how to penalize incorrect predictions. The optimization problems associated with various linear classifiers are defined as minimizing the loss on training points (sometime along with a regularization term).

They can also be used to evaluate the quality of models.



  • Squared loss = <math>(y-\hat{y})^2</math>



0-1 loss: Penalty is 0 for correct prediction, and 1 otherwise

As 0-1 loss is not convex, the standard approach is to transform the categorical features into numerical features: (See Statistics - Dummy (Coding|Variable) - One-hot-encoding (OHE)) and to use a regression loss.


Log loss is defined as: <MATH> \begin{align} \ell_{log}(p, y) = \begin{cases} -\log (p) & \text{if } y = 1\\\ -\log(1-p) & \text{if } y = 0 \end{cases} \end{align} </MATH> where

  • <math>p</math> is a probability between 0 and 1. <note tip>A base probability for a binary event will be just the mean over the training target</note><note tip>Then it can be compared to the output of probabilistic model such as the logistic regression</note>
  • <math>y</math> is a label of either 0 or 1.

Log loss is a standard evaluation criterion when predicting rare-events such as click-through rate prediction


from math import log

def computeLogLoss(p, y):
    """Calculates the value of log loss for a given probabilty and label.
        p (float): A probabilty between 0 and 1.
        y (int): A label.  Takes on the values 0 and 1.

        float: The log loss value.
    epsilon = 10e-12
    if p==0:
        p = epsilon
    elif p==1:
        p = p - epsilon
    if y == 1:
        return -log(p)
    elif y == 0:
        return -log(1-p)

Powered by ComboStrap