# Loss functions (Incorrect predictions penalty)

Loss functions define how to penalize incorrect predictions. The optimization problems associated with various linear classifiers are defined as minimizing the loss on training points (sometime along with a regularization term).

They can also be used to evaluate the quality of models.

## Type

### Regression

• Squared loss = $(y-\hat{y})^2$

### Classification

#### 0-1

0-1 loss: Penalty is 0 for correct prediction, and 1 otherwise

As 0-1 loss is not convex, the standard approach is to transform the categorical features into numerical features: (See Statistics - Dummy (Coding|Variable) - One-hot-encoding (OHE)) and to use a regression loss.

#### Log

Log loss is defined as: \begin{align} \ell_{log}(p, y) = \begin{cases} -\log (p) & \text{if } y = 1\\\ -\log(1-p) & \text{if } y = 0 \end{cases} \end{align} where

• $p$ is a probability between 0 and 1. A base probability for a binary event will be just the mean over the training targetThen it can be compared to the output of probabilistic model such as the logistic regression
• $y$ is a label of either 0 or 1.

Log loss is a standard evaluation criterion when predicting rare-events such as click-through rate prediction

Python

from math import log

def computeLogLoss(p, y):
"""Calculates the value of log loss for a given probabilty and label.
Args:
p (float): A probabilty between 0 and 1.
y (int): A label.  Takes on the values 0 and 1.

Returns:
float: The log loss value.
"""
epsilon = 10e-12
if p==0:
p = epsilon
elif p==1:
p = p - epsilon

if y == 1:
return -log(p)
elif y == 0:
return -log(1-p)


Discover More Convex

Log(loss) is convex, which means that we can use gradient descent to find weights that result in a global minimum. 0 / 1 loss is not convex due to its abrupt decision boundary at z = 0, so it is difficult... Loss functions (Incorrect predictions penalty)

Loss functions define how to penalize incorrect predictions. The optimization problems associated with various linear classifiers are defined as minimizing the loss on training points (sometime along with... Statistics - Residual sum of Squares (RSS) = Squared loss ?

The Residual sum of Squares (RSS) is defined as below and is used in the Least Square Method in order to estimate the regression coefficient. The smallest residual sum of squares is equivalent to the... 