Lecture Notes

15. Singular Value Decomposition

Part of the Series on Linear Algebra.

By Akshay Agrawal. Last updated Dec. 20, 2018.

Previous entry: Ellipsoids ; Next entry: QR Factorization

One of the central aims of linear algebra is to find bases with respect to which the matrices of linear maps are simple. We saw in a previous section that every self-adjoint linear map has a diagonal matrix with respect to an orthonormal basis consisting of eigenvectors. In this section, we will see that every linear map has a diagonal matrix with respect to orthonormal bases and for the domain and range of , respectively. The expression of the matrix of with respect to these bases is called the singular value decomposition (SVD) of ; it is analogous to the eigenvector decomposition for symmetric matrices. The upshot is this: In terms of the bases and , the linear map simply dilates some components of the input and shrinks others, while possibly truncating components or appending zeros to account for passing from to when .

The SVD

Let be any real matrix. There exist orthogonal matrices and and a diagonal matrix such that . It is conventional to arrange the diagonal entries , which are nonnegative, in decreasing order , where . The positive are called the singular values of , and the columns of and are its left and right singular vectors. The expression is called the SVD of . This means that with respect to the orthonormal bases and of and respectively, the linear map has a diagonal matrix, namely, . Geometrically, the SVD shows that a linear map first rotates its input by , scales each axis by , and appends zeros (if ) or truncates (if ) to obtain an -vector.

If we drop the from , along with and , we obtain what is called the thin SVD, , where , , and . In block diagram form, the thin SVD is

Note that for , , and for , .

Construction

The SVD is closely related to the eigendecomposition of a symmetric matrix. To construct an SVD of a matrix of rank , we need an orthonormal basis of and an orthonormal basis of such that for . We will take to be an orthonormal basis for consisting of eigenvectors of , and we will take to be normalized images of these vectors under .

Let be the eigendecomposition of , so that are orthonormal eigenvectors of and the diagonal entries of are arranged in decreasing order. Clearly is an orthonormal basis for . Notice that

so is a list of mutually orthogonal vectors; moreover, the nonzero vectors in this list are a basis for the range of . Since is positive semidefinite, the are all nonnegative; since has rank (and since the rank of a matrix equals the rank of its corresponding Gram matrix), . Also, is nonzero for and zero otherwise, since . Hence, is a basis for the range of . Define

so that is an orthonormal list of vectors spanning . If , we can extend this list to an orthonormal basis of . Forming a matrix and taking gives an SVD of .

Relationship to eigenvectors of and

Because , the right singular vectors are orthonormal eigenvectors of , with eigenvalues .

Similarly, since , the left singular vectors are orthonormal eigenvectors of , with eigenvalues .

Subspaces spanned by the singular vectors

Since is a basis for , for (where ), and for , is a basis of . Because is an orthonormal list, is a basis for .

Similarly, is a basis of and is a basis of .

Image of the unit ball

The SVD provides a compelling geometric description of linear transformations: every linear transformation from deforms the unit ball in into an ellipsoid in .

The unit ball in can be represented as the set of linear combinations of whose coefficient vectors have norm at most one, i.e.,

(since because the are orthonormal). The image of a vector is . Define . Then the image of under contains all vectors of the form such that

That is, the image of under is an ellipsoid with semi-axes and with lengths respectively. Hence, every linear transformation collapses dimensions, deforms the unit- sphere into an ellipsoid, and embeds the ellipsoid into . This characterization also shows that (which is achieved by ).

As a sum of outer products

Any matrix product can be expressed as a sum of outer products where are the columns of and the rows of . Since

the matrix can be expressed as .

Since , expands in the basis with components . The scalars are just the components of in the orthonormal basis , so this representation makes clear that just scales the components of when passing from the basis to the basis.

Moore-Penrose inverse

The matrix , where is obtained by transposing and inverting its nonzero entries, is called the Moore-Penrose inverse of . When is tall and full-rank, ; when it is short and full-rank, it equals . The Moore-Penrose inverse can be used to obtain minumum-norm solutions to least-squares problems, as we will see in a subsequent section.

Error analysis

Suppose that is invertible, and that we seek such that . Of course, . But suppose that our measurement of is corrupted so that we observe , so with ( and should be interpreted as infinitesimals). Then the error in , , is at most . Since , it is also the case that . Then

or equivalently

The number is called the condition number of ; it is denoted , and it is equal to . For nonsquare matrices, the condition number is defined in terms of the psuedoinverse: .

If is small, then we say that is well-conditioned; otherwise, we say that it is ill-conditioned. When the condition number of a matrix is large, small perturbations in the measurement of can lead to large perturbations in the estimation of . If is extremely large, then should be treated as singular, for practical purposes, even if it is invertible.

Low-rank approximations

Suppose , , . We seek a matrix of rank , , that best approximates with respect to the operator norm. That is, we seek minimizing .

It turns out that the best rank- approximation is (in fact, is also the best rank- approximation with respect to the Frobenius norm, though we won’t show this here). This means that the outer products are ordered by importance; this is intuitive, since corresponds to the direction of maximal stretch in the domain of . Notice that . In particular, can be intepreted as the distance from to the nearest singular matrix; if is small, then is nearly singular.

To see why is in fact is an optimal rank- approximation of , let be any matrix with rank at most . The dimension of the null space of is at least ; spans a dimensional subspace of . Hence there exists a unit vector that is in the intersection of and . Then

so (in the above, we used the fact that is an orthonormal list of vectors to obtain the first equality and the inequality, since ).

References

  1. Stephen Boyd and Sanjay Lall. EE 263 Course Notes.
  2. Dan Kalman. A Singularly Valuable Decomposition: The SVD of a Matrix.