$$ \newcommand{\qed}{\tag*{$\square$}} \newcommand{\span}{\operatorname{span}} \newcommand{\dim}{\operatorname{dim}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\norm}[1]{\|#1\|} \newcommand{\grad}{\nabla} \newcommand{\prox}[1]{\operatorname{prox}_{#1}} \newcommand{\inner}[2]{\langle{#1}, {#2}\rangle} \newcommand{\mat}[1]{\mathcal{M}[#1]} \newcommand{\null}[1]{\operatorname{null} \left(#1\right)} \newcommand{\range}[1]{\operatorname{range} \left(#1\right)} \newcommand{\rowvec}[1]{\begin{bmatrix} #1 \end{bmatrix}^T} \newcommand{\Reals}{\mathbf{R}} \newcommand{\RR}{\mathbf{R}} \newcommand{\Complex}{\mathbf{C}} \newcommand{\Field}{\mathbf{F}} \newcommand{\Pb}{\operatorname{Pr}} \newcommand{\E}[1]{\operatorname{E}[#1]} \newcommand{\Var}[1]{\operatorname{Var}[#1]} \newcommand{\argmin}[2]{\underset{#1}{\operatorname{argmin}} {#2}} \newcommand{\optmin}[3]{ \begin{align*} & \underset{#1}{\text{minimize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optmax}[3]{ \begin{align*} & \underset{#1}{\text{maximize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optfind}[2]{ \begin{align*} & {\text{find}} & & #1 \\ & \text{subject to} & & #2 \end{align*} } $$
This section treats the problem of least squares. We are given a matrix and a vector , and our goal is to find a vector that minimizes . This problem has “squares” in its name because it is often posed as the equivalent problem of minimizing .
In our study of orthogonal projections, we saw that if and is full rank, then this problem has a unique solution , where is the pseudoinverse of . The vector in that achieves the minimum distance to is , where is the familiar matrix with orthonormal columns from ’s full QR factorization.
Regardless of whether is full rank, and regardless of whether , the vector in achieving minimum distance to is the orthogonal projection of onto . This means that if is a matrix with orthonormal columns spanning the range of , then minimizes over all ; this was proved in the previous section. When has a non-trivial nullspace, however, the least squares problem has infinitely many solutions.
Consider the general least squares problem, with and the problem data. Let be a matrix with orthonormal columns spanning the range of ; such a matrix can be obtained by applying the modified Gram-Schmidt algorithm to the columns of . Let be a vector in such that (such a vector must exist, since is the orthogonal projection onto , which means ). Then the solution set of the least squares problem is
The set always has at least one member, namely . If has a nontrivial nullspace (i.e., if there is a vector other than in the null space of ), then has infinitely many members.
We can also characterize the residual vector , where is any particular solution to the least squares problem. Let be a matrix whose columns are an orthonormal basis for and a matrix whose columns are an orthonormal basis for . The residual vector is
where the last equality comes from the fact that is the complementary projection of , i.e., it is the projection onto . Notice that we say the residual vector, because the residual vector is unique; while the least squares problem might have infinitely many solutions, there is a uniqe vector in that is closest to . The optimal squared residual is therefore . Although has orthonormal columns, it is not an orthogonal matrix (because it is not square), so we cannot further simplify the righthand side.
Suppose we want to estimate a vector . We cannot measure directly; instead, we observe a vector , which we know to be a function of : . The matrix , < represents a sensor for , and the vector is a random variable representing measurement noise with mean and variance . We seek an unbiased estimate of that is a linear function of .
One such estimate is , since . It turns out that has lower variance than any other unbiased estimator of ; in this sense, is the best linear unbiased estimator of . To see this, let be some other unbiased estimator of , and note that . We will show that is positive semidefinite.
The variance of is
Because is an orthogonal projection, for all , and . Hence
This implies that is positive semidefinite.
The vector gives the least-norm solution of the general least squares problem (in which may be singular), where is the Moore-Penrose inverse (see the notes on the SVD). To see why this is the case, let , so that and (the matrix is obtained by transposing and inverting the non-zero entries). If is diagonal, then it is clear that the least-norm minimizer of is . Otherwise, note that
,
since is an isometry. Let . Note that and that is invertible, so is the least-norm minimizer of if and only if is the least-norm minimizer of . Since is diagonal, by our previous argument we conclude that the least-norm minimizer of is . Therefore, the least-norm minimizer of is .