Data Mining - (Stochastic) Gradient descent (SGD)
Table of Contents
1 - About
Gradient descent can be used to train various kinds of regression and classification models.
It's an iterative process and therefore is well suited for map reduce process.
2 - Articles Related
3 - Linear regression
The gradient descent update for linear regression is: <MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>
where:
- <math>i</math> is the iteration number of the gradient descent algorithm,
- <math>j</math> identifies the observation.
- <math>N</math> identifies the number of observations.
- <math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math> is the summand
- <math>y</math> is the target value
- <math>\mathbf{x}</math> is a features vector.
- <math>\mathbf{w}_i</math> is the weights vector for the iteration <math>i</math> (when starting they are all null). <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
- <math>\alpha_i</math> is the step. <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math> . alpha start with the value 1. <math>\alpha_0 = 1</math>
3.1 - Summand
exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]
where:
- dot is a dot product