Chapter 5 Eigenvectors and Eigenvalues
5.1 Eigenvectors and Eigenvalues
Definition 5.1 Let \(f\) be a linear map from \(\mathbf{K}^n\) to \(\mathbf{K}^n\). Let \(\lambda\in\mathbf{K}\). We say that \(\lambda\) is an eigenvalue of \(f\) if there exists a non-zero vector \(u\in \mathbf{K}^n\) such that \(f(u)=\lambda u\).
Remark. If \(A\) is the matrix of \(f\) in some basis, we also say that \(\lambda\) is an eigenvalue of \(A\). This corresponds to the existence of a non-zero column vector \(X\) such that \(AX=\lambda X\).
Example 5.1 If \[A=\begin{bmatrix}1&0&0\\0&2&0\\0&0&3\end{bmatrix}\]
Then 1, 2, and 3 are eigenvalues because \(A\begin{bmatrix}1\\0\\0\end{bmatrix}=\begin{bmatrix}1\\0\\0\end{bmatrix}\), \(A\begin{bmatrix}0\\1\\0\end{bmatrix}=2\begin{bmatrix}0\\1\\0\end{bmatrix}\), and \(A\begin{bmatrix}0\\0\\1\end{bmatrix}=3\begin{bmatrix}0\\0\\1\end{bmatrix}\).
Example 5.2 If \(A=\begin{bmatrix}0&1\\1&0\end{bmatrix}\), then 1 is an eigenvalue because \(A\begin{bmatrix}1\\1\end{bmatrix}=\begin{bmatrix}1\\1\end{bmatrix}\).
Definition 5.2 Let \(\lambda\in\mathbf{K}\) and \(f\) be a linear map from \(\mathbf{K}^n\) to itself (or the associated matrix \(A\)). The eigenspace associated with \(\lambda\) is \[E_\lambda=\{u\in\mathbf{K}^n \mid f(u)=\lambda u\}.\]
An element of \(E_\lambda\) is called an eigenvector of \(f\) (or of \(A\)).
Remark. For any \(f\) and \(\lambda\), \(0\in E_{\lambda}\) and \(\lambda\) is an eigenvalue if \(E_\lambda\neq\{0\}\).
Example 5.3 For \(A=\begin{bmatrix}0&1\\1&0\end{bmatrix}\), \((x,y)\in E_\lambda\) if and only if \(y=\lambda x\) and \(x=\lambda y\).
Thus, \(E_1=\{(x,x) \mid x\in\mathbf{R}\}\), \(E_{-1}=\{(x,-x) \mid x\in\mathbf{R}\}\), and \(E_\lambda=\{0\}\) for \(\lambda\ne\pm1\) (\(x=\lambda^2 x \implies x=0\) and \(y=\lambda^2 y \implies y=0\)).
Lemma 5.1 For any linear map \(f\colon\mathbf{K}^n\to\mathbf{K}^n\) (or the associated matrix \(A\)) and \(\lambda\in\mathbf{K}\), \(E_\lambda\) is a vector subspace of \(\mathbf{K}^n\).
Proof. Let \(X, Y \in E_\lambda\). Then \(A(X+Y)=AX+AY=\lambda X+\lambda Y=\lambda (X+Y)\), so \(X+Y \in E_\lambda\).
Let \(\mu \in \mathbf{K}\) and \(X \in E_\lambda\). Then \(A(\mu X)=\mu AX=\mu \lambda X=\lambda(\mu X)\), so \(\mu X \in E_\lambda\).
Definition 5.3 Let \(A\in\mathrm{M}_n(\mathbf{K})\) and \(E\) be a vector subspace of \(\mathbf{K}^n\). We say that \(E\) is invariant under \(A\) if for all \(X \in E\), \(AX \in E\).
Lemma 5.2 Any eigenspace of a matrix \(A\in\mathrm{M}_n(\mathbf{K})\) is an invariant subspace for \(A\).
Proof. Let \(\lambda\) be an eigenvalue of \(A\) and \(X \in E_\lambda\). Then \(AX = \lambda X \in E_\lambda\).
Remark. The idea behind the introduction of invariant subspaces and eigenspaces is similar to “divide and conquer”. We want to decompose the space \(\mathbf{K}^n\) into a sum of subspaces that are invariant under \(A\), on which the restriction of \(A\) is simple, ideally a mere scaling, as is the case for an eigenspace.
5.2 Characteristic Polynomial
Definition 5.4 Let \(A\in\mathrm{M}_n(\mathbf{K})\). The characteristic polynomial of \(A\) is \(\det(A - X I_n)\). We denote it by \(\chi_A\).
Example 5.4 If \(A=\begin{bmatrix}1&2\\2&1\end{bmatrix}\), then \[\begin{align*} \chi_A(X)&=\det\left(\begin{bmatrix}1-X&2\\2&1-X\end{bmatrix}\right)\\ &=(1-X)^2-4\\ &=X^2-2X-3 \end{align*}\]
Lemma 5.3 The characteristic polynomial of \(A\) is a polynomial of degree \(n\).
Proof. The determinant of a matrix is computed by products and sums of the coefficients. Thus, the characteristic polynomial is obtained with sums of products of the terms \(a_{i,j}\) for \(i\neq j\) and \(a_{i,i}-X\). Therefore, it is a polynomial. Among the coefficients of \(A-XI_n\), there are \(n\) with an \(X\) term, so the degree is at most \(n\).
By expanding the calculations (or by induction), the coefficient of \(X^n\) in this polynomial is \((-1)^n\neq0\). Thus, the polynomial is of degree exactly \(n\).
Remark. The determinant satisfies the property \(\det(AB)=\det(A)\det(B)\) for \(A, B \in \mathrm{M}_n(\mathbf{K})\). From this, it follows that if \(P\) is an invertible matrix (\(PP^{-1}=I_n\)), then \(\det(P)\det(P^{-1})=1\), and hence \(\det(P^{-1})=\frac{1}{\det(P)}\).
Thus, if \(A, A'\) are two matrices representing the same linear map \(f\) in bases \(\mathcal{B}\) and \(\mathcal{B}'\), and \(P\) is the change of basis matrix from \(\mathcal{B}\) to \(\mathcal{B}'\), then \(\det(A')=\det(P^{-1})\det(A)\det(P)=\det(A)\). Therefore, for a linear map, the calculation of the determinant for the associated matrix does not depend on the choice of basis.
In particular, the same is true for the characteristic polynomial, so one can speak of the characteristic polynomial of a linear map unambiguously.
Lemma 5.4 Let \(A\in\mathrm{M}_n(\mathbf{K})\) be a matrix with characteristic polynomial \(\chi_A\) and \(\lambda\in\mathbf{K}\).
The number \(\lambda\) is an eigenvalue of \(A\) if and only if \(\chi_A(\lambda)=0\).
Proof. The number \(\lambda\) is an eigenvalue of \(A\) if and only if there exists \(U\in\mathbf{K}^n\) such that \(AU=\lambda U\), i.e., \((A-\lambda I_n)U=AU-\lambda U=0\). This is equivalent to \(A-\lambda I_n\) not being injective, which is equivalent to \(\det(A-\lambda I_n)=0 \iff \chi_A(\lambda)=0\).
Example 5.5 If \(A=\begin{bmatrix}1&2\\2&1\end{bmatrix}\), then \(\chi_A(X)=X^2-2X-1\) with roots \(-1\) and \(3\). Thus, \(A\) has two eigenvalues: -1 and 3.
Proposition 5.1 Let \(T\) be an upper triangular matrix. The eigenvalues of \(T\) are exactly the diagonal entries \(T_{1,1},\dots,T_{n,n}\) of \(T\).
Proof. The matrix \(T - X I_n\) is also upper triangular. Its determinant is the product of the diagonal entries, i.e.,
\[\chi_T(X)=\prod_{i=1}^n(T_{i,i}-X).\] The roots of this polynomial are exactly the \(T_{i,i}\), which are therefore the eigenvalues of \(T\).
5.3 Diagonalizable Matrices
Definition 5.5 A matrix \(A\in\mathrm{M}_n(\mathbf{K})\) is diagonal if \(A_{i,j}=0\) for \(i\neq j\).
Example 5.6 The matrix \(\begin{bmatrix}1&0&0\\0&3&0\\0&0&5\end{bmatrix}\) is diagonal.
The linear map associated with a diagonal matrix is very simple: it involves scaling along each axis of the canonical basis.
Remark. Diagonal matrices are special cases of upper triangular matrices.
Definition 5.6 A matrix \(A\in\mathrm{M}_n(\mathbf{K})\) is diagonalizable if there exists an invertible matrix \(P\in\mathrm{M}_n(\mathbf{K})\) such that \(P^{-1}AP\) is diagonal.
Remark. A matrix (or the associated linear map) is diagonalizable if, by changing the basis, that is, interpreting \(P\) as a change of basis matrix, the matrix of the linear map is diagonal.
Example 5.7 Consider Example 5.5 with \(A=\begin{bmatrix}1&2\\2&1\end{bmatrix}\). We have seen that this matrix has two eigenvalues: -1 and 3. We seek the associated eigenvectors.
Eigenspace \(E_{-1}\):
We solve the system \(AX=-X\) by letting \(X=\begin{bmatrix}x\\ y\end{bmatrix}\). We obtain the system
\[\left\{\begin{matrix}x+2y&=-x\\2x+y&=-y\end{matrix}\right.\]
which is equivalent to \(y=-x\). An eigenvector for -1 is therefore \(\begin{bmatrix}1\\-1\end{bmatrix}\).
Eigenspace \(E_{3}\):
We solve the system \(AX=3X\) by letting \(X=\begin{bmatrix}x\\ y\end{bmatrix}\). We obtain the system
\[\left\{\begin{matrix}x+2y&=3x\\2x+y&=3y\end{matrix}\right.\]
which is equivalent to \(y=x\). An eigenvector for 3 is therefore \(\begin{bmatrix}1\\1\end{bmatrix}\).
By setting \(P=\begin{bmatrix}1&1\\-1&1\end{bmatrix}\), we have \(P^{-1}=\frac{1}{2}\begin{bmatrix}1&-1\\1&1\end{bmatrix}\) and we obtain \(P^{-1}AP=\begin{bmatrix}-1&0\\0&3\end{bmatrix}\). The matrix \(A\) is indeed diagonalizable.
Definition 5.7 (Direct Sum of Vector Subspaces) Let \(E\) be a vector subspace of \(\mathbf{R}^n\) and let \(E_1,\dots, E_k\) be vector subspaces contained within \(E\). We say that \(E\) is the direct sum of the \(E_i\) if for every \(v \in E\), there exists a unique \(k\)-tuple with \(v_1 \in E_1, \dots, v_k \in E_k\) such that \[v = v_1 + \dots + v_k.\] We denote this as \(E = E_1 \oplus \dots \oplus E_k\).
Lemma 5.5 Let \(E = E_1 \oplus \dots \oplus E_k\) be a direct sum of vector subspaces. Then \(\dim(E) = \sum_{i=1}^k \dim(E_i)\).
Proof. Choose a basis \((e_1^i, \dots, e_{l_i}^i)\) for each \(E_i\). The concatenation of these bases, \((e_1^1, \dots, e_{l_1}^1, e_1^2, \dots, e_{l_k}^k)\), forms a basis for \(E\) due to the existence and uniqueness of the decomposition \(v = v_1 + \dots + v_k\).
Proposition 5.2 Let \(A\) be a matrix in \(\mathrm{M}_n(\mathbf{R})\) and \(\lambda_1, \dots, \lambda_k\) its eigenvalues. Then \[\mathrm{Span}(E_{\lambda_1}, \dots, E_{\lambda_k}) = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}.\]
Proof. Since \(\mathrm{Span}(E_{\lambda_1}, \dots, E_{\lambda_k})\) is spanned by the eigenvectors of \(A\), it is enough to show that a family of eigenvectors corresponding to distinct eigenvalues forms a linearly independent set. In other words, it suffices to show that if \(v_i \in E_{\lambda_i}\) for each \(i\) and \(v_1 + \dots + v_k = 0\), then \(v_i = 0\) for all \(i\).
Applying \(A, A^2, \dots, A^{k-1}\) to \(v_1 + \dots + v_k\) gives: \[v_1 + \dots + v_k = 0\] \[\lambda_1 v_1 + \dots + \lambda_k v_k = 0\] \[\vdots\] \[\lambda_1^{k-1} v_1 + \dots + \lambda_k^{k-1} v_k = 0\] Since the \(\lambda_i\) are distinct, the system (Vandermonde system) has a unique solution \(v_1 = \dots = v_k = 0\). Thus, the sum is indeed direct.
Theorem 5.1 A matrix \(A\) with eigenspaces \(E_{\lambda_1}, \dots, E_{\lambda_k}\) is diagonalizable if and only if \(\mathbf{R}^n = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}\).
Proof. If \(A\) is diagonalizable, it means there exists a basis of eigenvectors (in which the matrix is diagonal). These eigenvectors are grouped by eigenvalue, which gives a direct sum decomposition of \(\mathbf{R}^n\).
Conversely, if \(\mathbf{R}^n = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}\), we choose a basis for each \(E_{\lambda_i}\) and concatenate these bases to obtain a basis for \(\mathbf{R}^n\) in which the matrix is diagonal.
Example 5.8 Suppose a linear transformation is represented by the matrix \[\begin{bmatrix}1&0&0&0&0\\0&\sqrt{2}&0&0&0\\0&0&\sqrt{2}&0&0\\0&0&0&5&0\\0&0&0&0&5\end{bmatrix}\] in the basis \((e_1, \dots, e_5)\). Then \(\mathbf{R}^5 = E_1 \oplus E_{\sqrt{2}} \oplus E_5\) where \(E_1 = \mathrm{Span}(e_1)\), \(E_{\sqrt{2}} = \mathrm{Span}(e_2, e_3)\), and \(E_5 = \mathrm{Span}(e_4, e_5)\).
Corollary 5.1 If \(A\) is a matrix with eigenspaces \(E_{\lambda_1}, \dots, E_{\lambda_k}\), then \(A\) is diagonalizable if and only if \(\sum_{i=1}^k \dim(E_{\lambda_i}) = n\).
Proof. If \(A\) is diagonalizable, then \(\mathbf{R}^n\) is a direct sum of the \(E_{\lambda_i}\), so \(\sum_{i=1}^k \dim(E_{\lambda_i}) = \dim(\mathbf{R}^n) = n\).
Conversely, we know that \(\mathrm{Span}(E_{\lambda_1}, \dots, E_{\lambda_k}) = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}\) and since \(\dim(\mathrm{Span}(E_{\lambda_1}, \dots, E_{\lambda_k})) = \sum_{i=1}^k \dim(E_{\lambda_i}) = n\), it must be that \(\mathrm{Span}(E_{\lambda_1}, \dots, E_{\lambda_k}) = \mathbf{R}^n\), thus \(\mathbf{R}^n = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}\).
Example 5.9 In Example 5.5, we have \(\dim(E_{-1}) \geq 1\) and \(\dim(E_{3}) \geq 1\), so the sum of these dimensions is at least 2. Therefore, we can immediately deduce that \(A\) is diagonalizable.
5.4 Diagonalization of Symmetric Matrices
Definition 5.8 A matrix \(A \in \mathrm{M}_n(\mathbf{K})\) is symmetric if \(^tA = A\).
Remark. Visually, a matrix is symmetric if the main diagonal is an axis of symmetry for its entries.
The equation \(^tA = A\) translates to \(\forall 1 \leq i, j \leq n\), \(A_{i,j} = A_{j,i}\).
Example 5.10 The matrix \(\begin{bmatrix}1&4&2\\4&2&3\\2&3&5\end{bmatrix}\) is symmetric.
Example 5.11 Let \(X_1, \dots, X_n\) be real random variables on the same probability space. Then the covariance matrix
\(A = \begin{bmatrix} \mathrm{Cov}(X_i, X_j) \end{bmatrix}\) is a symmetric matrix in \(\mathrm{M}_n(\mathbf{R})\).
Lemma 5.6 Let \(A \in \mathrm{M}_n(\mathbf{R})\). The matrix \(A\) is symmetric if and only if for all \(X, Y \in \mathbf{R}^n\), \(\left< AX, Y \right> = \left< X, AY \right>\).
Proof. \[\left< AX, Y \right> = ^t(AX) Y = ^tX ^tA Y = ^tX A Y = \left< X, AY \right>\]
Definition 5.9 A matrix \(A \in \mathrm{M}_n(\mathbf{R})\) is positive if for all \(X \in \mathbf{R}^n\), \(^tXAX \geq 0\).
A matrix \(A \in \mathrm{M}_n(\mathbf{R})\) is positive definite if for all non-zero \(X \in \mathbf{R}^n\), \(^tXAX > 0\).
Lemma 5.7 Let \(A \in \mathrm{M}_n(\mathbf{R})\) be positive. Every eigenvalue of \(A\) is non-negative.
Moreover, \(A\) is positive definite if and only if \(0\) is not an eigenvalue, i.e., if and only if \(A\) is invertible.
Proof. Let \(\lambda\) be an eigenvalue of \(A\) and \(X\) a non-zero eigenvector associated with \(\lambda\), i.e., \(AX = \lambda X\).
Then \(^tXAX = \lambda ^tXX = \lambda ||X||^2\). Since \(||X||^2 > 0\) and \(^tXAX \geq 0\), we have \(\lambda = ^tXAX / ||X||^2 \geq 0\), and if \(A\) is positive definite, then \(\lambda > 0\).
Conversely, if \(0\) is not an eigenvalue, then the linear transformation associated with \(A\) is injective and hence invertible, because the dimension of the domain space matches the dimension of the codomain space.
Lemma 5.8 Let \(A \in \mathrm{M}_n(\mathbf{R})\) be a symmetric matrix. If \(E\) is an invariant subspace for \(A\), then \(E^\bot\) is also an invariant subspace.
Proof. Let \(X \in E\) and \(Y \in E^\bot\). We have \(\left< AY, X \right> = ^t(AY) X = ^tY ^tA X = ^tY (AX) = \left< Y, AX \right> = 0\) because \(AX \in E\).
Thus, \(AY \in E^\bot\).
Definition 5.10 (Orthogonal Sum of Subspaces) Let \(E\) be a vector subspace of \(\mathbf{R}^n\) and let \(E_1, \dots, E_k\) be vector subspaces contained within \(E\). We say that \(E\) is the orthogonal sum of the \(E_i\) if for \(i \neq j\), \(E_i \perp E_j\) and for every \(v \in E\), there exist \(v_i \in E_i\) for each \(i \in \{1, \dots, k\}\) such that \[v = v_1 + \dots + v_k.\] We denote this as \[E = E_1 \oplus^\bot \dots \oplus^\bot E_k.\]
Remark. If \(E = E_1 \oplus^\bot \dots \oplus^\bot E_k\) is an orthogonal sum, then the decomposition \(v = v_1 + \dots + v_k\) is unique because each \(v_i\) is the orthogonal projection of \(v\) onto \(E_i\). Thus, an orthogonal sum is always a direct sum.
Proposition 5.3 Let \(A\) be a symmetric matrix and \(\lambda_1, \lambda_2\) be two distinct eigenvalues of \(A\). Then \(E_{\lambda_1} \perp E_{\lambda_2}\).
Proof. Let \(X \in E_{\lambda_1}\) and \(Y \in E_{\lambda_2}\). Then
\[^tYAX = ^tY(AX) = \lambda_1 ^tYX = \lambda_1 \langle X, Y \rangle\] and
\[^tYAX = ^t(AY)X = \lambda_2 ^tYX = \lambda_2 \langle X, Y \rangle.\]
Thus, \((\lambda_1 - \lambda_2) \langle X, Y \rangle = 0\) and since \((\lambda_1 - \lambda_2) \neq 0\), we have \(\langle X, Y \rangle = 0\).
Theorem 5.2 Let \(A \in \mathrm{M}_n(\mathbf{R})\) be a symmetric matrix. Then there exists an orthogonal matrix \(P \in \mathrm{O}_n(\mathbf{R})\) such that \(PAP^{-1}\) is diagonal.
Remark. The theorem means exactly that there exists an orthonormal basis in which the matrix of the linear transformation associated with \(A\) is diagonal, or equivalently that \(\mathbf{R}^n = E_{\lambda_1} \oplus \dots \oplus E_{\lambda_k}\) where the \(\lambda_i\) are the eigenvalues of \(A\).
Proof. We prove the result by induction on \(n\).
Base case \(n=1\). In dimension 1, any matrix is symmetric and diagonal simultaneously. There is nothing to prove.
Inductive step. Assume the result for \(n-1 \geq 1\).
Consider the function \(\varphi: X \mapsto \langle X, AX \rangle\) for \(X \in \mathbf{S}^n\). Let \(X\) be an element of \(\mathbf{S}^n\) that maximizes \(\varphi\) (which exists due to the “compactness” of the unit sphere). Fix \(X\) and for \(H \perp X\), consider the function \[\begin{array}{rcl} f: \mathbf{R} & \to & \mathbf{R} \\ t & \mapsto & \varphi(X + tH) = \left\langle \frac{X + tH}{||X + tH||}, A\left(\frac{X + tH}{||X + tH||}\right)\right\rangle \end{array}.\]
We have \(f(t) = \frac{\left\langle X + tH, A(X + tH) \right\rangle}{\left\langle X + tH, X + tH \right\rangle}\) and hence \(f(t) = \frac{g(t)}{h(t)}\) with
\[g(t) = \left\langle X + tH, A(X + tH) \right\rangle\] and \[h(t) = \left\langle X + tH, X + tH \right\rangle.\]
Since \(\varphi\) reaches its maximum at \(X\), \(f\) also reaches its maximum at \(t=0\).
Expanding, we find \(g(t) = \left\langle X, AX \right\rangle + 2t \left\langle H, AX \right\rangle + O(t^2)\) and \(h(t) = \left\langle X, X \right\rangle + 2t \left\langle X, H \right\rangle + O(t^2)\).
The derivatives at 0 are \(g'(0) = 2 \left\langle H, AX \right\rangle\) and \(h'(0) = 2 \left\langle X, H \right\rangle = 0\). Differentiating the quotient, we get \(f'(0) = \frac{g'(0) h(0) - g(0) h'(0)}{h(0)^2}\). Since \(h(0) = \left\langle X, X \right\rangle = 1\), we have
\[f'(0) = \left\langle H, AX \right\rangle.\]
Since \(f'(0) = 0\), we have \(H \perp AX\). This is true for all \(H \in X^\bot\), so \(AX \in (X^\bot)^\bot = \mathbf{R}X\). Thus, there exists \(\lambda_0 \in \mathbf{R}\) such that \(AX = \lambda_0 X\), and hence \(X\) is a non-zero eigenvector. Let \(A'\) be the restriction of \(A\) to \(E_{\lambda_0}^\bot\). By the characterization in Lemma 5.6, \(A'\) is also symmetric because for all \(X, Y \in E_{\lambda_0}^\bot\), \(\langle X, A'Y \rangle = \langle X, AY \rangle = \langle AX, Y \rangle = \langle A'X, Y \rangle\).
Since \(\dim(E_{\lambda_0}^\bot) < n\), we can apply the induction hypothesis to \(A'\) acting on the space \(E_{\lambda_0}^\bot\). We then obtain that \(E_{\lambda_0}^\bot = E_{\lambda_1} \oplus^\bot \dots \oplus^\bot E_{\lambda_k}\) where the \(\lambda_i\) are the eigenvalues of \(A'\). In the end, \(\mathbf{R}^n = E_{\lambda_0} \oplus^\bot E_{\lambda_1} \oplus^\bot \dots \oplus^\bot E_{\lambda_k}\) and we have decomposed \(\mathbf{R}^n\) into an orthogonal direct sum of eigenspaces.
We choose an orthonormal basis for each of the \(E_{\lambda_i}\) and concatenate them to obtain an orthonormal basis \(\mathcal{B}\) for \(\mathbf{R}^n\). The change of basis matrix from the canonical basis to \(\mathcal{B}\) is therefore an orthogonal matrix \(P\), and \(PAP^{-1}\) is thus a diagonal matrix with the \(\lambda_i\) on the diagonal.
Example 5.12 In Example 5.7, we diagonalized the matrix \(A\), but to obtain an orthogonal change of basis matrix, we need to normalize the basis vectors (we already know they are orthogonal from Proposition 5.3). Thus, the orthogonal change of basis matrix \(P = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\) is orthogonal, with \(P^{-1} = ^tP = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\), and we obtain
\[P^{-1}AP = \begin{bmatrix} -1 & 0 \\ 0 & 3 \end{bmatrix}.\]
5.5 An Application: Calculating Powers of a Matrix
In some problems, it is useful to calculate the powers of a matrix. For a diagonal matrix, things are simple; for a diagonalizable matrix, we reduce to this simple case.
Lemma 5.9 Let \(D\) be a diagonal matrix \(D = \begin{bmatrix} \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & \dots & 0 & \lambda_n \end{bmatrix}\). Then for all \(k \in \mathbf{N}\),
\[D^k = \begin{bmatrix} \lambda_1^k & 0 & \dots & 0 \\ 0 & \lambda_2^k & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & \dots & 0 & \lambda_n^k \end{bmatrix}.\]
Proof. We prove the result by induction on \(k \geq 1\). For \(k=1\), the result is obvious, and to go from \(k\) to \(k+1\), the result follows from matrix calculation:
\[D^{k+1} = D^kD = \begin{bmatrix} \lambda_1^k & 0 & \dots & 0 \\ 0 & \lambda_2^k & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & \dots & 0 & \lambda_n^k \end{bmatrix} \cdot \begin{bmatrix} \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & \dots & 0 & \lambda_n \end{bmatrix} = \begin{bmatrix} \lambda_1^{k+1} & 0 & \dots & 0 \\ 0 & \lambda_2^{k+1} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & \dots & 0 & \lambda_n^{k+1} \end{bmatrix}.\]
Proposition 5.4 Let \(A\) be a diagonalizable matrix where \(A = PDP^{-1}\) with \(D\) diagonal and \(P\) a change of basis matrix. Then
\[A^k = PD^kP^{-1}.\]
Proof. In fact, the calculation is true as soon as \(A = PDP^{-1}\), whether \(D\) is a diagonal matrix or not, and we prove the result by induction on \(k \in \mathbf{N}\). For \(k=1\), the result is immediate. The transition from \(k\) to \(k+1\) follows from the calculation:
\[A^{k+1} = AA^k = PDP^{-1}PD^kP^{-1} = PDD^kP^{-1} = PD^{k+1}P^{-1}.\]
Example 5.13 Consider Example 5.7 with \(P^{-1}AP = \begin{bmatrix} -1 & 0 \\ 0 & 3 \end{bmatrix}\) where \(P = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\).
We have
\[\begin{align*} A^k &= \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} (-1)^k & 0 \\ 0 & 3^k \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} \\ &= \begin{bmatrix} \frac{(-1)^k + 3^k}{2} & \frac{(-1)^{k+1} + 3^k}{2} \\ \frac{(-1)^{k+1} + 3^k}{2} & \frac{(-1)^k + 3^k}{2} \end{bmatrix}. \end{align*}\]
5.6 Exercises
Exercise 5.1 Let \(A\) be the matrix \(\begin{bmatrix} 2 & 1 & 2 \\ 1 & 2 & 2 \\ 2 & 2 & 1 \end{bmatrix}\).
- Justify a priori that the matrix is diagonalizable with an orthonormal basis.
- Compute the characteristic polynomial of \(A\).
- Deduce the eigenvalues of \(A\).
- Diagonalize \(A\) with an orthonormal basis, i.e., find a diagonal matrix \(D\) and an orthogonal matrix \(P\) such that \(A = PDP^{-1}\).
- Deduce \(A^k\) for \(k \in \mathbf{N}\).
Exercise 5.2 Consider the linear transformation given by the matrix \[A = \begin{bmatrix} 1 & -1 & 1 \\ -2 & 0 & -1 \\ -2 & -2 & 1 \end{bmatrix}.\]
- Determine its characteristic polynomial.
- Compute its eigenvalues.
- Provide a basis for the eigenspaces.
- Diagonalize \(A\).
Exercise 5.3 Consider the linear transformation given by the matrix \[A = \begin{bmatrix} -1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & 1 & -1 \end{bmatrix}.\]
- Justify that \(A\) is diagonalizable.
- Compute the characteristic polynomial of \(A\) and find its roots.
- What is the dimension of \(E_{1}\)? Provide a basis for this eigenspace.
- Determine a basis for \(E_{-2}\) and deduce an orthogonal change of basis matrix for diagonalizing \(A\).
Exercise 5.4 Let \(A\) be a matrix in \(\mathrm{M}_n(\mathbf{R})\). Consider the matrix \(S = ^tAA\).
- Show that \(S\) is symmetric positive.
- Prove that \(S\) is positive definite if and only if \(A\) is invertible.
- Justify that conversely, if \(S\) is a positive symmetric matrix, then there exists a matrix \(A\) such that \(S = ^tAA\). You may first handle the case where \(S\) is diagonal.