Singular-Value Decomposition

\(AA^\mathsf{T}\) and \(A^\mathsf{T}A\)

Let \(A \in \mathbb{R}^{m \times n}\). Clearly, both \(AA^\mathsf{T}\) and \(A^\mathsf{T}A\) are real symmetric matrices and so they have only real eigenvalues and are diagonalizable. But a bit more can be said about their eigenvalues.

First, one can show that all the eigenvalues are nonnegative.

Next, every positive eigenvalue of \(A^\mathsf{T}A\) is also an eigenvalue of \(AA^\mathsf{T}\). Indeed, if \(v\) is an eigenvector of \(A^\mathsf{T}A\) with eigenvalue \(\lambda\), with \(u = A v\), we have \(AA^\mathsf{T}u = A(A^\mathsf{T}A)v = A(\lambda v) = \lambda Av = \lambda u\). Note that \(u \neq 0\) or else \(0 = u^\mathsf{T} u = (Av)^\mathsf{T} (Av) = v^\mathsf{T}A^\mathsf{T}A v = v^\mathsf{T}(\lambda v) = \lambda v^\mathsf{T}v = \lambda \|v\|^2 \neq 0\), a contradiction. Hence, \(u\) is an eigenvector of \(AA^\mathsf{T}\) with eigenvalue \(\lambda\).

One can apply a similar argument to show that every positive eigenvalue of \(AA^\mathsf{T}\) is also an eigenvalue of \(A^\mathsf{T}A\). As a result, \(AA^\mathsf{T}\) and \(A^\mathsf{T}A\) have the same eigenvalues. But more is true: The eigenvalues have the same multiplicities. (There is no need to distinguish between algebraic multiplicity and geometric multiplicity here because real symmetric matrices are diagonalizable.) To this end, it suffices to show the following:

Let \(v_1,\ldots, v_k\) be a basis for the eigenspace of \(A^\mathsf{T}A\) for the positive eigenvalue \(\lambda\). Then the vectors \(u_1 = Av_1,\ldots, u_k = Av_k\) are linearly independent.

(As shown above, \(u_i\) is an eigenvector of \(AA^\mathsf{T}\) with eigenvalue \(\lambda\) for each \(i = 1,\ldots, k\). Thus, if \(u_1,\ldots, u_k\) are linearly independent, the geometric multiplicity of \(\lambda\) with respect to \(AA^\mathsf{T}\) will be at least \(k\). Applying a similar argument going the other way, we obtain that the geometric multiplicities of \(\lambda\) with respect to \(A^\mathsf{T}A\) and \(AA^\mathsf{T}\) are equal.)

Suppose to the contrary that \(u_1,\ldots, u_k\) are not linearly independent. Then there exist \(\alpha_1,\ldots, \alpha_k \in \mathbb{R}\), not all equal to zero, such that \[\alpha_1 u_1 + \cdots + \alpha_k u_k = 0.\]

Multiplying both sides on the left by \(A^\mathsf{T}\), we obtain \[\alpha_1 A^\mathsf{T} u_1 + \cdots + \alpha_k A^\mathsf{T} u_k = 0,\] or equivalently, \[(\alpha_1 \lambda)v_1 + \cdots + (\alpha_k\lambda) v_k = 0.\]

As \(\lambda \gt 0\) and not all \(\alpha_1,\ldots, \alpha_k \in \mathbb{R}\), are equal to zero, we see that not all \(\alpha_1\lambda,\ldots, \alpha_k\lambda\) are equal to zero, implying that \(v_1,\ldots,v_k\) are not linearly independent, a contradiction.

Singular value decomposition (SVD)

Let \(\lambda_1,\ldots, \lambda_r\) be the positive eigenvalues of \(A^\mathsf{T}A\), listing multiplicities, such that \(\lambda_1 \geq \cdots \geq \lambda_r\). Then \(\sigma_i = \sqrt{\lambda_i}\) for \(i = 1,\ldots, r\) are called the singular values of \(A\).

Take an orthogonal diagonalization of \(A^\mathsf{T}A\) given by \(V D V^\mathsf{T}\) where the first \(r\) diagonal entries of \(D\) are \(\lambda_1,\ldots, \lambda_r\). Let \(u_i = \frac{1}{\sigma_i} Av_i\) for \(i = 1,\ldots r\). Extend \(\{u_1,\ldots,u_r\}\) to an orthonormal basis \(\{u_1,\ldots,u_m\}\) of \(\mathbb{R}^{m}\) such that \(\{u_{r+1},\ldots,u_m\}\) is a basis for the nullspace of \(A^\mathsf{T}\). (Note that \(AA^\mathsf{T} = U D' U^\mathsf{T}\) where \(D'\) is a diagonal matrix with the first \(r\) diagonal entries given by \(\lambda_1,\ldots,\lambda_r\) and the remaining entries equal to 0.)

Let \(\Sigma\) be an \(m\times n\) matrix such that \(\Sigma_{i,i} = \sigma_i\) for \(i = 1,\ldots,r\) and the remaining entries equal to 0. Then, one can check that \[AV = U \Sigma.\] Since \(V\) is orthogonal, multiplying both sides on the right by \(V^\mathsf{T}\), we obtain \[A = U \Sigma V^\mathsf{T}.\] This is called a singular value decomposition of \(A\).

Example

Let \(A = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1\end{bmatrix}\). We will obtain an svd of \(A\). First, we obtain the eigenvalues of \(A^\mathsf{T}A = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 1\end{bmatrix}\) by finding roots of the characteristic polynomial \begin{eqnarray*} \left | \begin{array}{ccc} 1 -\lambda & 1 & 0 \\ 1 & 2-\lambda & 1 \\ 0 & 1 & 1-\lambda \end{array} \right | = (1-\lambda) \left | \begin{array}{cc} 2-\lambda & 1 \\ 1 & 1-\lambda \end{array} \right | - \left | \begin{array}{ccc} 1 & 1 \\ 0 & 1-\lambda \end{array} \right | = -\lambda(\lambda - 3)(\lambda -1 ) \end{eqnarray*} Hence, the eigenvalues are \(3,1,0\). We set \(\sigma_1 = \sqrt{3}\) and \(\sigma_2 = 1\).

The RREF of \(A^\mathsf{T}A - 3I\) is \(\begin{bmatrix} 1 & 0 & -1\\ 0 & 1 & -2 \\ 0 & 0 & 0\end{bmatrix}\). Hence, an orthonormal basis for \(N(A^\mathsf{T}A - 3I)\) is given by the single vector \(v_1 = \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}}\end{bmatrix}\). Set \(u_1 = \frac{1}{\sigma_1} A v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \).

The RREF of \(A^\mathsf{T}A - I\) is \(\begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 0 \\ 0 & 0 & 0\end{bmatrix}\). Hence, an orthonormal basis for \(N(A^\mathsf{T}A - I)\) is given by the single vector \(v_2 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}}\end{bmatrix}\). Set \(u_2 = \frac{1}{\sigma_2} A v_2 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \).

The RREF of \(A^\mathsf{T}A\) is \(\begin{bmatrix} 1 & 0 & -1\\ 0 & 1 & 1 \\ 0 & 0 & 0\end{bmatrix}\). Hence, an orthonormal basis for \(N(A^\mathsf{T}A)\) is given by the single vector \(v_3 = \begin{bmatrix} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\end{bmatrix}\).

Setting \(U = [u_1~u_2]\), \(\Sigma = \begin{bmatrix} \sigma_1 & 0 & 0\\ 0 & \sigma_2 & 0 \end{bmatrix}\), and \(V = [v_1~v_2~v_3]\), we get \(A = U\Sigma V^\mathsf{T}\) with \(U\) and \(V\) orthogonal.

The calculations involved for obtaining a SVD are difficult to carry out by hand. As a result, one often uses a numerical computing software package to compute SVDs. In Matlab or Octave, there is a command called svd that computes a SVD for a given matrix.

Low-rank matrix approximation using SVD

Let \(A \in \mathbb{R}^{m\times n}\). Let \(r\) be a positive integer less than the minimum of \(m\) and \(n\). Consider an svd of \(A\) given by \(U \Sigma V^\mathsf{T}\).

Let \(U_r\) and \(V_r\) denote the matrices obtained from the first \(r\) columns of \(U\) and \(V\), respectively. Let \(\Sigma_r\) be an \(r\times r\) diagonal matrix such that the \(i\)th diagonal entry is given by \(\Sigma_{i,i}\) for \(i = 1,\ldots r\).

The Eckart-Young Theorem states that the matrix \(M = U_r \Sigma_r V_r^\mathsf{T}\) is a rank-\(r\) matrix that minimizes \(\|A-M\|_F\). In other words, there is no other \(m\times n\) matrix of rank \(r\) that has distance closer to \(A\) than \(M\) with distance given by the Frobenius norm. The proof of this is beyond the scope of these notes and will not be covered here. (For the matrix \(A\) in the example above \(u_1 \sigma_1 v_1^\mathsf{T} = \begin{bmatrix} \frac{1}{2} & 1 & \frac{1}{2} \\ \frac{1}{2} & 1 & \frac{1}{2}\end{bmatrix}\) is a rank-\(1\) approximation of \(\begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1\end{bmatrix}\). What is the distance between these two matrices?)

Other applications of SVD

There are in fact many uses of singular value decomposition other than image compression. You can find out more at the Wikipedia.