Linear Algebra - Closest point in higher dimension than a plane

About

Solving closest point in the span of many vectors

Goal:

An algorithm that, given a vector b and vectors v1, . . . , vn, finds the vector in Span {v1, . . . , vn} that is closest to b.

Special case: We can use the algorithm to determine whether b lies in Span {v1, . . . , vn}: If the vector in Span {v1, . . . , vn} closest to b is b itself then clearly b is in the span; if not, then b is not in the span.

Testing if b is in Span {v1, . . . , vn} is equivalent to testing if the equation Ax = b has a solution.

More generally: Even if Ax = b has no solution, we can use the algorithm to find the point in <math>\{Ax : x \in \mathbb{R}\}</math> closest to b.

Computation Problem

Goal: find the coefficients x1, . . . , xn so that x1 v1 + · · · + xn vn is the vector in Span {v1, . . . , vn} closest to b.

Two methods:

Norm Minimization

Equivalent: Find the vector x that minimizes:

<math> \| b - A.x \|</math>
<math> {\| b - A.x \|}^2</math>

Given a matrix equation Ax = b that might have no solution, find the best solution available in the sense that the norm of the error b - Ax is as small as possible.

Two algo:

one based on Gaussian elimination.
one based on orthogonality that is much faster and more reliable than gradient descent

Least square

Find the vector x that minimizes:

<math> \| b - A.x \|</math>
<math> {\| b - A.x \|}^2</math>
<math> {\| \begin{bmatrix}b\end{bmatrix} - \begin{bmatrix} \begin{array}{r|r|r} && \\ v_1 & \dots & v_n \\ && \end{array} \end{bmatrix}. \begin{bmatrix}x\end{bmatrix} ||}^2</math>
<math> {|| \begin{bmatrix}b\end{bmatrix} - \begin{bmatrix} \begin{array}{r} a_1 \\ \hline \\ \dots \\ \hline \\ a_m \\ \end{array} \end{bmatrix}.\begin{bmatrix}x\end{bmatrix} \|}^2</math>
<math>(b[1]-a_1.x)^2 + \dots + (b[m]-a_m.x)^2</math>

This problem is called least squares (”methode des moindres carres”, due to Adrien-Marie Legendre but often attributed to Gauss).

This problem can be addressed using gradient descent.

Projection

High-dimensional projection onto/orthogonal to

For a vector b and a vector space V, we define the projection of b onto V (written <math>b^{||V}</math> ) and the projection of b orthogonal to V (written <math>b^{\perp V}</math> ) so that <MATH> b = b^{||V} + b^{\perp V} </MATH> where:

the projection onto V: <math>b^{||V}</math> is in V, and
the projection orthogonal to V: <math>b^{\perp V}</math> is orthogonal to every vector in V.

Lemma

High-Dimensional Lemma: The point in vector space V closest to b is <math>b^{||V}</math> and the distance is the norm <math>\|b^{\perp V}\|</math> .

Computation

Projection orthogonal

The below function will find the projection of b orthogonal to the space spanned by mutually orthogonal vectors.

# input: a vector b, and a list vlist of mutually orthogonal vectors
# output: the projection orthogonal of b (\perp) to the vectors in vlist
def project_orthogonal(b, vlist):
    for v in vlist:
        b = b - project_along(b, v)
    return b

where:

vlist consists of mutually orthogonal vectors: the <math>i^{th}</math> vector in the list is orthogonal to the <math>j^{th}</math> vector in the list for every <math>i \neq j</math> .
project_along is the one-dimensional projection along the line

The projection of a vector b orthogonal to a vector space is unique, so in principle the order of vectors in vlist doesn’t affect the output of project orthogonal(b, vlist).

The below function will return the <math>\sigma</math> of all generators.

def aug_project_orthogonal(b, vlist):
    alphadict = {len(vlist):1}
    for i in range(len(vlist)):
        v = vlist[i]
        sigma = (b*v)/(v*v) if v*v > 0 else 0
        alphadict[i] = sigma
        b = b - sigma*v
    return (b, alphadict)

Solution

Apply orthogonalize to <math>{v_1} , \dots , {v_n}</math> and obtain a span of mutually orthogonal vector <math>v_1^* , \dots , v_n^*</math> . Then <math>V = Span\{v_1^* , \dots , v_n^*\}</math>
Call project_orthogonal(b, [<math>v_1^* , \dots , v_n^*</math> ]) and obtain <math>b^{\perp}</math> as the result.

Exactly the same computations take place when orthogonalize is applied. Ie applying orthogonalize to the list of vector <math>[{v_1} , \dots , {v_n}, b]</math> will return <math>[v_1^* , \dots , v_n^*, b^{\perp}]</math>