Given

Given many data points, construct a matrix equation in the form of a linear equation (this matrix equation will be overdetermined). The matrix equation below is $A x = b$ This equation is linear but it doesn’t have to be, just adjust accordingly to represent the equations as a matrix equation.

111111 x_{0} x_{1} x_{2} x_{3} x_{4} x_{5} [b m] = y_{0} y_{1} y_{2} y_{3} y_{4} y_{5}

Goal

Using Best Approximation, find a vector in subspace $C o l A$ closest to $b$

$C o l A$ is a subspace of $R^{n}$ , $b \in R^{n}$ $\forall x, a = A x$ , i.e. $a \in C o l A$ $\hat{b} = A \overset{x}{^} = p ro j_{C o l A} b$ Note: Can only use Orthogonal Decomposition for $\hat{b}$ when the columns of A form an orthogonal basis, by definition $⟹$

\forall a \in C o l A, a \neq = \hat{b} ∣∣ b - \hat{b} ∣∣ < ∣∣ b - a ∣∣ \forall A x \in C o l A, A x \neq = A \overset{x}{^} ∣∣ b - A \overset{x}{^} ∣∣ < ∣∣ b - A x ∣∣

In other words, $\hat{b} = A \overset{x}{^} = p ro j_{co l a} b$ is closest vector in $C o l A$ to $b$ Note: $\overset{x}{^}$ is a unique vector, a special $x$ that minimizes the above equation. Note: If the columns of $A$ are orthogonal, then you can just use the scalar projection of $b$ onto each column of $A$ .

Normal Equations

The least squares solutions to $A x = b$ corresponds to the solution to

A^{T} A x = A^{T} b

Turns the $A x = b$ equation from above and transforms it into a square matrix equation

Derivation

$(C o l A)^{⊥} = N u ll (A^{T})$
$\overset{x}{^}$ is the Least Squares Solution to $A x = b$ $⟺ b - A \overset{x}{^} ⊥ C o l A$
$b - A \overset{x}{^} ⊥ C o l A ⟹ b - A \overset{x}{^} \in (C o l A)^{⊥}$
$b - A \overset{x}{^} \in (C o l A)^{⊥} ⟹ b - A \overset{x}{^} \in N u l A^{T}$
$b - A \overset{x}{^} \in N u l A^{T} ⟺ A^{T} (b - A \overset{x}{^}) = 0$
$A^{T} (b - A \overset{x}{^}) = 0$
$A^{T} b - A^{T} A \overset{x}{^} = 0$
$A^{T} A \overset{x}{^} = A^{T} b$

Normal Equation Usage

Use when non-square matrix
- Over/Under determined
Regression

Theorem (Unique Solutions for Least Squares)

If A is m x n

Ax = b has a unique least squares solution for each b in Rm
Cols of A are linearly independent
The matrix A^T A is invertible If the above hold, the unique least square solution is

\overset{x}{^} = (A^{T} A)^{- 1} A^{T} b

If the above conditions are not true, there may be infinitely many solutions, or some other nonunique amount of solutions, in which case you should consider $[A^{T} A A^{T} b]$ instead.

Note: $A^{T} A$ plays the role of the “length squared” of the matrix A

Theorem (Least Squares and QR)

A \in R^{m \times n} = QR ⟹ Least Squares Solution \overset{x}{^} = R^{- 1} Q^{T} b

Examples

A = 401021, b = 2011 A^{T} A = [17118] A^{T} b = [1911] Now setup the Normal Equations: A^{T} A x = A^{T} b [17118] x = [1911] x = [17118]^{- 1} [1911] \overset{x}{^} = [12]

Hampton Explanation for Least Squares

Let $A \in R^{m \times n}$ . $\overset{x}{^}$ is the unique, minimizing solution to the equation $A x = b$ such that

\forall x \in R^{n} ∣∣ b - A \overset{x}{^} ∣∣ \leq ∣∣ b - A x ∣∣

Essentially, minimize $b - A \overset{x}{^}$
- $∣∣ b - \hat{b} ∣∣$ is the minimal distance between the different solutions
$\forall x \in R^{n}, A x \in C o l A, b =^{?} C o l A$
Goal: Find $\overset{x}{^}$ s.t. $A \overset{x}{^} \in C o l A$ is closest to $b$
$\overset{x}{^}$ in this context just denotes the special/unique $x$ that minimizes the distances between $b$ and $A \overset{x}{^}$ b is closer to Axhat than to Ax for all other x in Col A
If b in Col A, then xhat is…
Seek $\overset{x}{^}$ so that $A \overset{x}{^}$ is as close to $b$ as possible, i.e. $\overset{x}{^}$ should solve Axhat = bhat
- $\hat{b} = A \overset{x}{^} = p ro j_{C o l (A)} b$

Math Notes

Explorer

Least Squares