About
Gradient descent can be used to train various kinds of regression and classification models.
It's an iterative process and therefore is well suited for map reduce process.
Articles Related
Linear regression
The gradient descent update for linear regression is: <MATH> \mathbf{w}_{i+1} = \mathbf{w}_i - \alpha_i \sum_{j=0}^N (\mathbf{w}_i^\top\mathbf{x}_j - y_j) \mathbf{x}_j \, </MATH>
where:
- <math>i</math> is the iteration number of the gradient descent algorithm,
- <math>j</math> identifies the observation.
- <math>N</math> identifies the number of observations.
- <math>(\mathbf{w}^\top \mathbf{x} - y) \mathbf{x} \,</math> is the summand
- <math>y</math> is the target value
- <math>\mathbf{x}</math> is a features vector.
- <math>\mathbf{w}_i</math> is the weights vector for the iteration <math>i</math> (when starting they are all null). <math>\mathbf{w}_0 = [0,0,0,\dots, 0]</math>
- <math>\alpha_i</math> is the step. <math>\alpha_i = \frac{\displaystyle \alpha_{i-1}}{\displaystyle n * \sqrt{i+1}}</math> . alpha start with the value 1. <math>\alpha_0 = 1</math>
Summand
exampleW = [1, 1, 1]
exampleX = [3, 1, 4]
exampleY = 2.0
gradientSummand = (dot([1 1 1], [3 1 4]) - 2) * [3 1 4] = (8 - 2) * [3 1 4] = [18 6 24]
where:
- dot is a dot product