Data Mining - (Stochastic) Gradient descent (SGD)

About

Gradient descent can be used to train various kinds of regression and classification models.

It's an iterative process and therefore is well suited for map reduce process.

Articles Related

Linear regression

The gradient descent update for linear regression is: <MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>

where:

<math>i</math> is the iteration number of the gradient descent algorithm,
<math>j</math> identifies the observation.
<math>N</math> identifies the number of observations.
<math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math> is the summand
<math>y</math> is the target value
<math>\mathbf{x}</math> is a features vector.
<math>\mathbf{w}_i</math> is the weights vector for the iteration <math>i</math> (when starting they are all null). <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
<math>\alpha_i</math> is the step. <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math> . alpha start with the value 1. <math>\alpha_0 = 1</math>

Summand

exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]

where:

dot is a dot product

Implementation

Spark - LinearRegressionWithSGD

Documentation / Reference

wiki/Stochastic_gradient_descent