PAGE 1

6.5 Least-Squares Problems

\[ A = \begin{bmatrix} -1 & 4 \\ 2 & -3 \\ -1 & 3 \end{bmatrix} \quad \vec{b} = \begin{bmatrix} 12 \\ 0 \\ 6 \end{bmatrix} \]\[ [A \quad \vec{b}] \sim \dots \sim \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

Pivot in right most column → inconsistent

no solution to \( A\vec{x} = \vec{b} \)

\( \vec{b} \) is not in \( \text{Col } A \)

so \( A\hat{x} = \hat{b} \) is consistent

and \( \| \vec{b} - A\hat{x} \| \leq \| \vec{b} - A\vec{x} \| \)

\( \vec{x} \): some other \( \vec{x} \)
\( A\vec{x} \): some vector in \( \text{Col } A \)

Figure: Geometric projection of vector b onto the subspace Col A, showing the error vector b minus A x-hat orthogonal to the plane.

\( \hat{b} = \text{proj}_{\text{Col } A} \vec{b} = A\hat{x} \)

\( A\hat{x} \) is some vector

\( \hat{x} \): Least-squares solution

(magnitude is square root of sum of squares and minimized)

PAGE 2

How to find \( \hat{x} \)?

\( \vec{b} - \hat{b} = \vec{b} - A\hat{x} \) is orthogonal to \( \text{Col } A \)

so this means any column of \( A \) dotted with \( \vec{b} - A\hat{x} \) is 0

\[ \Rightarrow A^T (\vec{b} - A\hat{x}) = \vec{0} \]\[ A^T \vec{b} - A^T A \hat{x} = \vec{0} \]

\( A^T A \hat{x} = A^T \vec{b} \)

solve this for \( \hat{x} \)

example

\[ A = \begin{bmatrix} -1 & 4 \\ 2 & -3 \\ -1 & 3 \end{bmatrix} \quad \vec{b} = \begin{bmatrix} 12 \\ 0 \\ 6 \end{bmatrix} \]\[ A^T = \begin{bmatrix} -1 & 2 & -1 \\ 4 & -3 & 3 \end{bmatrix} \]\[ A^T A = \begin{bmatrix} 6 & -13 \\ -13 & 34 \end{bmatrix} \quad A^T \vec{b} = \begin{bmatrix} -18 \\ 66 \end{bmatrix} \]\[ \begin{bmatrix} 6 & -13 & -18 \\ -13 & 34 & 66 \end{bmatrix} \sim \dots \sim \begin{bmatrix} 1 & 0 & 246/35 \\ 0 & 1 & 162/35 \end{bmatrix} \]

PAGE 3

\[ \hat{x} = \begin{bmatrix} 246/35 \\ 162/35 \end{bmatrix} \]

and

\[ \hat{b} = \text{proj}_{\text{Col } A} \vec{b} = \frac{246}{35} \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} + \frac{162}{35} \begin{bmatrix} 4 \\ -3 \\ 3 \end{bmatrix} \]

\[ = A \hat{x} = \begin{bmatrix} 402/35 \\ 6/35 \\ 48/7 \end{bmatrix} \]

Vector \vec{b} projected onto plane \text{Col } A to form projection \hat{b} . — Figure: Vector \( \vec{b} \) projected onto plane \( \text{Col } A \) to form projection \( \hat{b} \).

\[ A^T A \hat{x} = A^T \vec{b} \]

ALWAYS has at least a solution

(because \( \hat{b} \) is in \( \text{Col } A \))

but \( \hat{x} \) can have multiple solutions

PAGE 4

example

\[ A = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \end{bmatrix} \quad \vec{b} = \begin{bmatrix} 2 \\ 2 \\ 3 \\ 7 \end{bmatrix} \]

\[ [A \quad \vec{b}] \sim \dots \sim \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]

pivot in right most column

\( A\vec{x} = \vec{b} \) has no solution

\( \vec{b} \) is not in \( \text{Col } A \)

find \( \hat{x} \) : \( A^T A \hat{x} = A^T \vec{b} \)

also works if \( A\vec{x} = \vec{b} \) is consistent

\[ A^T A = \begin{bmatrix} 4 & 2 & 2 \\ 2 & 2 & 0 \\ 2 & 0 & 2 \end{bmatrix} \]

Plane \text{Col } A where \vec{b} = \hat{b} , implying \vec{x} = \hat{x} . — Figure: Plane \( \text{Col } A \) where \( \vec{b} = \hat{b} \), implying \( \vec{x} = \hat{x} \).

\[ A^T \vec{b} = \begin{bmatrix} 14 \\ 4 \\ 10 \end{bmatrix} \]

\( A: n \times m \)

\( A^T: m \times n \)

\( A^T A = m \times m \) always square

\[ \begin{bmatrix} 4 & 2 & 2 & 14 \\ 2 & 2 & 0 & 4 \\ 2 & 0 & 2 & 10 \end{bmatrix} \sim \dots \sim \begin{bmatrix} 1 & 0 & 1 & 5 \\ 0 & 1 & -1 & -3 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]

PAGE 5

Least Squares Solutions and Linear Independence

System of Equations:

\[ \begin{aligned} x_3 & \text{ free} \\ x_2 &= x_3 - 3 \\ x_1 &= 5 - x_3 \end{aligned} \]

Vector Form:

\[ \hat{x} = \begin{bmatrix} 5 \\ -3 \\ 0 \end{bmatrix} + x_3 \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \]

There are many ways to build \( \hat{b} \) out of columns of

\[ A = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \end{bmatrix} \]

because the columns of \( A \) here are not linearly independent.

If columns of \( A \) are linearly independent

then \( \hat{x} = (A^T A)^{-1} A^T \vec{b} \)

Problematic if \( A \) is large or messy

because small errors in \( A^T A \) can make \( \hat{x} \) very wrong.

\[ A^T A = \begin{bmatrix} 4 & 2 & 2 \\ 2 & 2 & 0 \\ 2 & 0 & 2 \end{bmatrix} \text{ has the same structure} \]

So \( A^T A \) is not invertible.

So \( (A^T A) \hat{x} = A^T \vec{b} \) does not have a unique solution.

PAGE 6

Stabilizing the Algorithm: QR Decomposition

To stabilize the algorithm, use \( A = QR \) where:

\( R \) is an upper triangular matrix with positive diagonal elements.
\( Q \) columns are an orthonormal basis for \( \text{Col } A \).
\( Q^T Q = I \rightarrow Q^T = Q^{-1} \)

Derivation of the stabilized least squares solution:

\[ \begin{aligned} A^T A \hat{x} &= A^T \vec{b} \\ (QR)^T QR \hat{x} &= (QR)^T \vec{b} \\ R^T Q^T Q R \hat{x} &= R^T Q^T \vec{b} \end{aligned} \]

Note: Since \( Q^T Q = I \):

\[ R^T R \hat{x} = R^T Q^T \vec{b} \]

\[ R \hat{x} = (R^T)^{-1} R^T Q^T \vec{b} \]

Note: Since \( (R^T)^{-1} R^T = I \):

\( R \) is triangular with non-zero main diagonal, so \( (R^T)^{-1} \) exists.

\[ \hat{x} = R^{-1} Q^T \vec{b} \]