$$ \newcommand{\qed}{\tag*{$\square$}} \newcommand{\span}{\operatorname{span}} \newcommand{\dim}{\operatorname{dim}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\norm}[1]{\|#1\|} \newcommand{\grad}{\nabla} \newcommand{\prox}[1]{\operatorname{prox}_{#1}} \newcommand{\inner}[2]{\langle{#1}, {#2}\rangle} \newcommand{\mat}[1]{\mathcal{M}[#1]} \newcommand{\null}[1]{\operatorname{null} \left(#1\right)} \newcommand{\range}[1]{\operatorname{range} \left(#1\right)} \newcommand{\rowvec}[1]{\begin{bmatrix} #1 \end{bmatrix}^T} \newcommand{\Reals}{\mathbf{R}} \newcommand{\RR}{\mathbf{R}} \newcommand{\Complex}{\mathbf{C}} \newcommand{\Field}{\mathbf{F}} \newcommand{\Pb}{\operatorname{Pr}} \newcommand{\E}[1]{\operatorname{E}[#1]} \newcommand{\Var}[1]{\operatorname{Var}[#1]} \newcommand{\argmin}[2]{\underset{#1}{\operatorname{argmin}} {#2}} \newcommand{\optmin}[3]{ \begin{align*} & \underset{#1}{\text{minimize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optmax}[3]{ \begin{align*} & \underset{#1}{\text{maximize}} & & #2 \\ & \text{subject to} & & #3 \end{align*} } \newcommand{\optfind}[2]{ \begin{align*} & {\text{find}} & & #1 \\ & \text{subject to} & & #2 \end{align*} } $$
In this section, we study a special type of linear operator, the projection. Projections are ubiquitous in numerical linear algebra, and orthogonal projections in particular are useful tools for solving minimization problems.
A projection is an idempotent linear operator satisfying .
Facts. Let be a projection. Then
Fact (6) shows that a projection separates into a direct sum decomposition:
If and are any subspaces of such that , then there exists a projection such that and . To see this, let be a linear map such that for , for . Then for , with , , and
Because , every can be represented uniquely as , with , . In fact, and , since and . For this reason, is called the complementary projection corresponding to .
Eigenvalues. To every projection , there corresponds
a basis of consisting of eigenvectors belonging to ; furthermore,
the eigenvalues corresponding to these eigenvectors are and .
To see this, consider a basis
of such that is a basis of and
is a basis of (note that such
a basis of is well-defined, since ,
which means that is the only vector belonging to both
and ). The vectors are evidently in the
eigenspace of , and are in the eigenspace of
. Since is a list of linearly independent
eigenvectors of , the eigenvalues of are precisely and .
From this characterization of the spectrum, it is clear that the trace of any projection (i.e., the sum of its eigenvalues) is equal to the dimension of the subspace into which it projects.
An orthogonal projection is a projection satisfying . If we write , then it is customary to say that such an operator is an orthogonal projection onto . For clarity, we sometimes denote the orthogonal projection onto as . Because , the orthogonal projection onto is unique: it decomposes every as , where the first term is in and the second one is in , since .
Note that orthogonal projections satisfy for all . Several other properties of orthogonal projections are listed below.
Orthogonal projections are self-adjoint. We can characterize orthogonal projections algebraically: A projection onto a subspace is orthogonal if and only if is self-adjoint, i.e., .
To see this, first assume that is an orthogonal projection. Let and be two vectors in , and write , , , . Then . Similarly, . This shows that .
For the other direction, assume that is a self-adjoint projection onto . Then . That is, .
Orthogonal projections are positive semidefinite. Let be an orthogonal projection onto some subspace. Since is a projection, for all , . Since is an orthogonal projection, , so . Hence is positive semidefinite.
Eigenvalue decomposition. Because any orthogonal projection onto a subspace is self-adjoint, it has an orthonormal basis of eigenvectors. Let be an orthonormal basis of eigenvectors for , with a basis for . With respect to the basis , the matrix of is
where is the block matrix of the identity. Hence, if
then the matrix of with respect to the standard basis is , where is obtained by dropping the last columns of .
Nonexpansive property. An orthogonal projection onto a subspace is nonexpansive: for all ,
This is readily shown by writing , , , and noting that , where the last equality is justified by the Pythagorean theorem. Note that this also implies that for all and in .
Orthogonal projection via the QR factorization. If is an orthonormal basis for a subspace , then for each ,
From the above equation, it is clear that the matrix of is given by . An important special case is the rank-one orthogonal projection onto the span of a vector : the matrix of is .
The complementary projection for an orthogonal projection is given by , which is an orthogonal projection onto since . For example, the orthogonal projection onto is given by .
To compute the orthogonal projection onto the span of a list of vectors , first perform modified Gram-Schmidt to obtain a list of orthonormal vectors, and then use the above fact. Equivalently, arrange the as the columns of a matrix , and write
via the full QR factorization. Then is the orthogonal projection onto , and is the orthogonal projection onto . Because orthogonal projections are unique, , i.e., the complementary projection projects onto . You can play around with these facts in code:
import numpy as np
A = np.arange(12).reshape((4, 3))
Q1 = np.linalg.qr(A)[0]
pi = Q1 @ Q1.T
cmpl = np.eye(4) - pi
v = np.random.randn(4)
print(A.T @ (cmpl @ v))
prints
array([8.8817842e-16, 0.0000000e+00, 0.0000000e+00])
Orthogonal projection with an arbitrary basis. We can express the orthogonal projection onto the range of a matrix , , , in terms of itself. Let be the projection of onto . Then the vector must be in , that is, . Since , there exists an such that , so we can write the previous equation as , or . This lets us conclude that the projection of onto is . Note that if we substitute for (where and are the block matrices from ’s full QR factorization), we obtain the familiar expression for the projection of onto .
The matrix is called the (left) pseudo-inverse of ; it is denoted . exists if and only if has at least as many rows as columns and is full-rank (because the Gram matrix is invertible if and only if has linearly independent columns). If , then is a solution to the linear equation , since is the projection of onto .
Minimum distance to a subspace. Let be a subspace of , . Then for all ,
i.e., is the vector in that is closest to (with respect to the induced norm). To see why, first note that
Because and , the righthand side is equal to . This completes the proof.
This fact is the key to solving least squares problems, in which we are given a matrix and a vector , and the goal is to find a vector that minimizes . We know that any that minimizes this expression must be such that is the orthogonal projection of onto the range of . If and , then the solution to this problem is unique, and it is given by (since, as proved in the previous section, is the projection of onto ).