# Data Mining - (Stochastic) Gradient descent (SGD)

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

## Linear regression

The gradient descent update for linear regression is: $$\mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \,$$

where:

• $i$ is the iteration number of the gradient descent algorithm,
• $j$ identifies the observation.
• $N$ identifies the number of observations.
• $(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,$ is the summand
• $y$ is the target value
• $\mathbf{x}$ is a features vector.
• $\mathbf{w}_i$ is the weights vector for the iteration $i$ (when starting they are all null). $\mathbf{w}_0 = [0,0,0,\dots, 0]$
• $\alpha_i$ is the step. $\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}$ . alpha start with the value 1. $\alpha_0 = 1$

### Summand

exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]


where:

## Documentation / Reference

Discover More
Convex

Log(loss) is convex, which means that we can use gradient descent to find weights that result in a global minimum. 0 / 1 loss is not convex due to its abrupt decision boundary at z = 0, so it is difficult...