Chapter 2 Matrices
2.1 Vector Space of Matrices of Size m×n
Definition 2.1 (Matrix) Let K be a field and n,m∈N be integers greater than 1. A matrix A of size m×n is a table of numbers A=(ai,j) with ai,j∈K for 1≤i≤m and 1≤j≤n, with m rows and n columns.
The integer i is the row index and j is the column index. The numbers ai,j are the coefficients of the matrix A.
Example 2.1 The array of numbers
A=[100110110101]
is a matrix with 3 rows and 4 columns with coefficients in F2.
Example 2.2 A row vector v=(v1,…,vn) from Kn can be viewed as a matrix of size 1×n. The column vector v=[v1⋮vn] can be viewed as a matrix of size n×1.
Each matrix A=(ai,j) of size m×n can be identified with a vector in Km×n. It is the vector (a1,1,…,am,1,a1,2,…,am,2,…,a1,n,…,am,n).
Visually, for the matrix A=[213041], we obtain the column vector
[234101]
which corresponds to stacking the columns of A on top of each other.
Definition 2.2 (Vector Space of Matrices) The set of matrices of size m×n with coefficients in K is denoted Mm,n(K). With the identification of a matrix with a column vector given above, the set Mm,n(K) is identified with the vector space Kmn. When m=n, we simply denote Mn(K) for the set of square matrices of size n×n.
The identification of Mm,n(K) with Kmn gives matrices a vector space structure. Thus, there is a vector addition and scalar multiplication for matrices. In fact, these operations are performed simply element by element.
Definition 2.3 (Matrix Addition) Let A=(ai,j) and B=(bi,j) be two matrices of size m×n over a field K. The sum A+B is the matrix (ai,j+bi,j).
Definition 2.4 (Scalar Multiplication) Let A=(ai,j) be a matrix of size m×n over a field K and λ∈K. The matrix λA is the matrix (λai,j).
Example 2.3 Consider the matrices A=[1223] and B=[0221]. Their sum is
A+B=[1444].
For λ=3, the multiplication of A by 3 is
3A=[3669].
Remark. One could also identify a matrix A of size m×n with a row vector of length mn by placing the rows of A one after the other. This would provide another identification of Mm,n(K) with a vector space but does not change the operations of vector addition and scalar multiplication.
The canonical basis of Kmn corresponds to matrices with zeros everywhere except for a single 1. We denote Ei,j for the matrix with zeros everywhere except for a 1 at position (i,j):
j i [000⋱⋮...0000⋯010⋯0000...⋮⋱000]
Definition 2.5 The canonical basis of Mm,n(K) is the family (Ei,j).
Thus, the matrix A=(ai,j) can be decomposed in the basis (Ei,j) as
A=∑i,jai,jEi,j. The identification of Mm,n(K) with Kmn immediately gives the following result.
Lemma 2.1 The dimension of the space of matrices Mm,n(K) is m×n.
2.2 Linear Transformation Associated with a Matrix
Definition 2.6 Let K be a field. A function f from Kn to Km is linear if:
- For all u,v∈Kn, f(u+v)=f(u)+f(v),
- For all u∈Kn and λ∈K, f(λu)=λf(u).
Remark. If f is a linear transformation, then f(0)=0. Indeed, for any u∈Kn, f(0)=f(0⋅u)=0⋅f(u)=0.
Definition 2.7 (Linear Transformation Associated with a Matrix) Let A∈Mm,n(K). The linear transformation associated with A is the transformation that maps a vector u=(uj)∈Kn to the vector v=(vi)∈Km given by the relations
vi=n∑j=1ai,juj.
The fact that the associated transformation above is indeed linear can be verified as follows: Let A=(ai,j) be the matrix and f the associated transformation.
- For two vectors u,u′∈Kn, if v=f(u+u′), then vi=∑1≤j≤nai,j(uj+u′j)=n∑j=1ai,juj+n∑j=1ai,ju′j and thus f(u+u′)=f(u)+f(u′).
- Similarly, for u∈Kn and λ∈K, if v=f(λu), then vi=n∑j=1ai,j(λuj)=λ(n∑j=1ai,juj) and thus f(λu)=λf(u).
It should be noted that the number of rows of A, m in this case, is the dimension of the codomain. The number of columns of A, which is n, corresponds to the dimension of the domain.
Example 2.4 Let A be the matrix with real coefficients
[203113].
The associated linear transformation is the function f:R3→R2 given by the formula
f(x,y,z)=(2x+3zx+y+3z).
2.3 Matrix Associated with a Linear Transformation
We have seen that each matrix is associated with a linear transformation. Conversely, for every linear transformation, one can associate a matrix after choosing a basis for the domain and codomain spaces.
Definition 2.8 Let f:Kn→Km be a linear transformation. Let B=(e1,…,en) be a basis for Kn and B′=(e′1,…,e′m) be a basis for Km. The matrix of f with respect to the bases B and B′ is the matrix A=(ai,j)∈Mm,n(K) where the (ai,j) are the coefficients of f(ej) in the basis B′, that is,
f(ej)=∑iai,je′i.
This matrix is denoted by MatB,B′(f).
Thus, the columns of MatB,B′(f) correspond to the coordinates of the images of f(ej).
Remark. The most common case is where the bases B and B′ are the canonical bases. In this case, the j-th column of the matrix is exactly the image of the vector (0,…,0,1,0,…,0) where the only non-zero coefficient is in the j-th position.
Example 2.5 Consider the canonical bases of F22 and F42. The function f:F22→F42 given by f(x,y)=(x,y,x+y,0) has the matrix
[10011100].
Example 2.6 If f:Kn→Kn is the identity transformation, i.e., f(u)=u for all u∈Kn, then the associated matrix (in any basis, provided it is the same for the domain and codomain) is the identity matrix:
In=[10……0010…000⋱000…0100……01].
Remark. If we start with a matrix A, associate it with its linear transformation f, and then associate f with its matrix in the canonical bases, we will easily get back A because we have the relation f(ej)=∑iai,je′i.
Thus, the operation that associates a matrix with its linear transformation is the inverse of the operation that associates a linear transformation with its matrix (when using canonical bases).
Remark. If n=m and B=B′, we will simply denote MatB(f) for the matrix associated with the linear transformation f in the basis B.
2.4 Matrix Multiplication
What happens when we compose two linear transformations? The proposition below shows that we obtain another linear transformation. How does this translate to the associated matrices? We will see that it corresponds to matrix multiplication.
Definition 2.9 (Matrix Multiplication) Let A=(ai,k)∈Ml,m(K) and B=(bk,j)∈Mm,n(K). The product matrix of A and B is the matrix C∈Ml,n(K) given by
ci,j=m∑k=1ai,kbk,j.
Remark. To multiply a matrix A by a matrix B, it is necessary that the number of columns in A is equal to the number of rows in B.
Remark. In general, even if A and B are square matrices of the same size, AB≠BA. We will see examples in exercises.
Lemma 2.2 Let f be a linear transformation from Kn to Km with matrix A in the bases B and B′. If u∈Kn has coordinates given by the column vector X in the basis B, then f(u) has coordinates given by the column vector Y=AX in the basis B′.
Proof. Let B=(e1,…,em). Then u=∑mj=1Xjej and
f(u)=m∑j=1Xjf(ej)=m∑j=1Xj(n∑i=1Ai,je′i)=n∑i=1(m∑j=1Ai,jXj)e′i.
Thus, we obtain Yi=∑mj=1Ai,jXj, which corresponds to the matrix product formula.
Proposition 2.1 Let f:Kn→Km and h:Km→Kl. The composition h∘f:Kn→Kl is a linear transformation.
Moreover, if B,B′,B″ are bases of \mathbf{K}^n, \mathbf{K}^m, and \mathbf{K}^l, respectively, and if A = \mathrm{Mat}_{\mathcal{B}', \mathcal{B}''}(h) and B = \mathrm{Mat}_{\mathcal{B}, \mathcal{B}'}(f), and C = \mathrm{Mat}_{\mathcal{B}, \mathcal{B}''}(h \circ f), then
C = AB.
Proof. First, we show that h \circ f is linear. Let u, v \in \mathbf{K}^n. Then
\begin{align*} (h \circ f)(u + v) &= h(f(u + v)) \\ &= h(f(u) + f(v)) \\ &= h(f(u)) + h(f(v)) \\ &= (h \circ f)(u) + (h \circ f)(v), \end{align*}
where we used the linearity of f and then of h. Similarly, for \lambda \in \mathbf{K} and u \in \mathbf{K}^n,
\begin{align*} (h \circ f)(\lambda u) &= h(f(\lambda u)) \\ &= h(\lambda f(u)) \\ &= \lambda (h \circ f)(u). \end{align*}
Now, let’s compute the columns of the matrix of h \circ f in the bases \mathcal{B} and \mathcal{B}'' of \mathbf{K}^n and \mathbf{K}^l. Let \mathcal{B} = (e_j)_{1 \leq j \leq n}, \mathcal{B}' = (e'_k)_{1 \leq k \leq m}, and \mathcal{B}'' = (e''_i)_{1 \leq i \leq l} be the three bases considered. The j-th column of the matrix C is given by h \circ f(e_j). Thus,
\begin{align*} h \circ f(e_j) &= h(f(e_j)) \\ &= h\left(\sum_{k=1}^m b_{k,j} e'_k\right) \\ &= \sum_{k=1}^m b_{k,j} h(e'_k) \\ &= \sum_{k=1}^m b_{k,j} \left(\sum_{i=1}^n a_{i,k} e''_i\right) \\ &= \sum_{i=1}^n \left(\sum_{k=1}^m a_{i,k} b_{k,j}\right) e''_i. \end{align*}
This shows exactly that c_{i,j} = \sum_{k=1}^m a_{i,k} b_{k,j}.
Example 2.7 If A = \begin{bmatrix} 3 & 1 & 2 \\ 1 & 1 & 0 \end{bmatrix} and B = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 2 & 3 \end{bmatrix}, then
AB = \begin{bmatrix} 6 & 5 & 6 \\ 2 & 1 & 0 \end{bmatrix}.
2.5 Image, Kernel, and Rank
Associated with a matrix (or a linear transformation), we find two important vector subspaces: its image and its kernel. These two spaces characterize the surjectivity and injectivity of the transformation.
Definition 2.10 The image of a linear transformation f: \mathbf{K}^n \to \mathbf{K}^m is the set \{v \in \mathbf{K}^m \mid \exists u \in \mathbf{K}^n, f(u) = v\}, denoted \mathrm{Im}(f).
Proposition 2.2 The image of a linear transformation f is a vector subspace of \mathbf{K}^m. Furthermore, if A is the matrix of f in the canonical basis, then \mathrm{Im}(f) is the vector space spanned by the columns of A.
Proof. Let v_1, v_2 \in \mathrm{Im}(f). Then there exist u_1, u_2 \in \mathbf{K}^n such that v_i = f(u_i), so v_1 + v_2 = f(u_1) + f(u_2) = f(u_1 + u_2) \in \mathrm{Im}(f). Similarly, if \lambda \in \mathbf{K}, then \lambda v_1 = \lambda f(u_1) = f(\lambda u_1) \in \mathrm{Im}(f). This shows that \mathrm{Im}(f) is indeed a vector subspace of \mathbf{K}^m.
Let v \in \mathrm{Im}(f). Then there exists u \in \mathbf{K}^n such that v = f(u). In the canonical basis (e_j) of \mathbf{K}^n, u = \sum_{j=1}^n u_j e_j, so f(u) = \sum_{j=1}^n u_j f(e_j). This shows that v is a linear combination of f(e_j). These are exactly the columns of the matrix A, so \mathrm{Im}(f) is the vector space spanned by the columns of A.
Definition 2.11 The rank of a linear transformation f is the dimension of \mathrm{Im}(f). It is denoted \mathrm{rank}(f).
For the associated matrix, this is the dimension of the vector space spanned by its columns.
Example 2.8 The real matrix A = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} has image equal to all of \mathbf{R}^2 because the vectors \begin{bmatrix} 1 \\ 0 \end{bmatrix} and \begin{bmatrix} 0 \\ 1 \end{bmatrix} form a basis of \mathbf{R}^2. Therefore, its rank is 2.
Example 2.9 The matrix \begin{bmatrix} 1 & 2 & 4 \\ 2 & 1 & 5 \\ 1 & 3 & 5 \end{bmatrix} has image \mathrm{Vect}\left(\begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix}\right) because these two vectors form a linearly independent (non-collinear) and generating set since \begin{bmatrix} 4 \\ 5 \\ 5 \end{bmatrix} = 2 \times \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} + \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix}. Thus, its rank is 2.
Example 2.10 The matrix \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix} has image \{(x, y, 0) \mid x, y \in \mathbf{R}\}. Therefore, its rank is 2.
Example 2.11 The matrix \begin{bmatrix}1 & 2 \\ 3 & 6 \end{bmatrix} has rank 1, with its image spanned by \begin{bmatrix}1 \\ 3 \end{bmatrix} since the second column vector is a multiple of the first.
Proposition 2.3 A linear transformation f: \mathbf{K}^n \to \mathbf{K}^m is surjective if and only if \mathrm{rank}(f) = m.
Proof. Since \mathrm{Im}(f) is a vector subspace of \mathbf{K}^m, these two spaces are equal if they have the same dimension, i.e., \mathrm{rank}(f) = m.
Definition 2.12 Let f: \mathbf{K}^n \to \mathbf{K}^m be a linear transformation. Its kernel is the set \{u \in \mathbf{K}^n \mid f(u) = 0\}. It is denoted \mathrm{ker}(f).
Remark. If A is the matrix of f in the canonical bases of \mathbf{K}^n and \mathbf{K}^m, then the kernel is the set of vectors X \in \mathbf{K}^n such that AX = 0.
Proposition 2.4 The kernel of a linear transformation f: \mathbf{K}^n \to \mathbf{K}^m is a vector subspace of \mathbf{K}^n.
Proof. Let f: \mathbf{K}^n \to \mathbf{K}^m be a linear transformation. If u_1, u_2 \in \mathrm{ker}(f), then f(u_1 + u_2) = f(u_1) + f(u_2) = 0 + 0 = 0, so u_1 + u_2 \in \mathrm{ker}(f). Similarly, if \lambda \in \mathbf{K}, then f(\lambda u_1) = \lambda f(u_1) = \lambda \cdot 0 = 0, so \lambda u_1 \in \mathrm{ker}(f).
Proposition 2.5 Let f: \mathbf{K}^n \to \mathbf{K}^m be a linear transformation. The transformation f is injective if and only if \mathrm{ker}(f) = \{0\}.
Proof. We always have 0 \in \mathrm{ker}(f) because f(0) = 0. If f is injective, then 0 is the only vector u such that f(u) = 0, so \mathrm{ker}(f) = \{0\}.
Conversely, suppose \mathrm{ker}(f) = \{0\}. Let u, u' be two vectors in \mathbf{K}^n such that f(u) = f(u'), then f(u - u') = 0, so u - u' \in \mathrm{ker}(f). Thus, u - u' = 0 and u = u'. This shows that f is injective.
Remark. If f is the linear transformation associated with a matrix A, we will use \mathrm{Im}(f) or \mathrm{Im}(A) for the image of f, and \ker(f) or \ker(A) for the kernel of f interchangeably.
Example 2.12 The kernel of the matrix A = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix} is the set of vectors X = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \in \mathbf{R}^3 such that
AX = 0, which gives the system
\left\{ \begin{matrix} y + z &= 0 \\ x + z &= 0 \\ 0 &= 0 \end{matrix}\right.
Thus, it is the set of vectors of the form X = \begin{bmatrix} x \\ x \\ -x \end{bmatrix} with x \in \mathbf{R}.
2.6 Solving Linear Systems
In practice, we often encounter situations where we need to solve a linear system of the form
\left\{ \begin{matrix} a_{1,1} x_1 + &\dots &+ a_{1,n} x_n &= b_1 \\ &\vdots &&\vdots \\ a_{m,1} x_1 + &\dots &+ a_{m,n} x_n &= b_m \end{matrix} \right.
with m rows and n columns. Matrix multiplication shows that this system can be rewritten as
AX = B, where A = [a_{i,j}] \in \mathrm{M}_{m,n}, X = [x_{j}] \in \mathbf{K}^n, and B = [b_{i}] \in \mathbf{K}^m.
Once the system is written in this form, we can address questions regarding its solution: Is there a solution? If so, is it unique?
Proposition 2.6 The system AX = B has a solution if and only if B \in \mathrm{Im}(A). Moreover, if X_1 and X_2 are two solutions, then X_1 - X_2 \in \mathrm{ker}(A).
Proof. The system AX = B can be rewritten as
x_1 A_1 + \dots + x_n A_n = B, where the A_i are the column vectors of A. Thus, the system has a solution if and only if B is a linear combination of the A_i, i.e., B \in \mathrm{Im}(A).
If X_1, X_2 are two solutions, then
A(X_1 - X_2) = AX_1 - AX_2 = B - B = 0, i.e., X_1 - X_2 \in \mathrm{ker}(A).
Corollary 2.1 If A has rank m, there is always a solution, and if \ker(A) = \{0\}, then there is at most one solution.
Example 2.13 The linear system AX = B where A = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix} and B = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} has a solution if and only if B \in \mathrm{Im}(A) \iff b_3 = 0, and in this case, the solution is not unique because \mathrm{ker}(A) \neq \{0\}.
2.7 Exercises
Exercise 2.1 Find all matrices M \in \mathrm{M}_2(\mathbf{R}) such that
M \begin{bmatrix} 5 & 3 \\ 0 & 5 \end{bmatrix} = \begin{bmatrix} 5 & 3 \\ 0 & 5 \end{bmatrix} M.
Exercise 2.2 Consider the linear transformations f: \mathbf{R}^2 \to \mathbf{R}^3 and g: \mathbf{R}^3 \to \mathbf{R}^2 given by the following formulas:
f(x, y) = (3x + 2y, y - x, 2y),
g(x, y, z) = (x + y + z, z - y).
Using matrix multiplication, give the expression for the function h = g \circ f: \mathbf{R}^2 \to \mathbf{R}^2.
Exercise 2.3 Calculate the image, rank, and kernel of the following matrices. Determine or not the injectivity and surjectivity of the associated linear transformations.
- A = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix},
- B = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix},
- C = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 5 & 10 \end{bmatrix}.
Exercise 2.4 Do the following systems have a solution? If so, is it unique?
- \left\{\begin{matrix} x + 2y + 3z &= 7 \\ 2x + 4y + 6z &= 14 \end{matrix}\right.
- \left\{\begin{matrix} x + 2y + 3z &= 7 \\ 2x + 4y + 6z &= 13 \end{matrix}\right.
Exercise 2.5 Let A = \begin{bmatrix} 1 & 2 \\ 0 & 2 \end{bmatrix}. Consider the mappings
\begin{array}{rcl} G_A: \mathrm{M}_2(\mathbf{R}) & \to & \mathrm{M}_2(\mathbf{R}) \\ M & \mapsto & AM \end{array} and \begin{array}{rcl} D_A: \mathrm{M}_2(\mathbf{R}) & \to & \mathrm{M}_2(\mathbf{R}) \\ M & \mapsto & MA \end{array}
- Show that D_A and G_A are linear transformations of the vector space \mathrm{M}_2(\mathbf{R}).
- Provide the matrices associated with these linear transformations in the canonical basis of \mathrm{M}_2(\mathbf{R}).
Exercise 2.6 (Adjacency Matrix and Number of Paths) Let G be an undirected graph with vertices labeled from 1 to n. Let A be the adjacency matrix of G (i.e., A_{i,j} = 1 if there is an edge between i and j, and A_{i,j} = 0 otherwise).
- Show by induction that the coefficient (A^k)_{i,j} of indices (i, j) in A^k (which is the product \underbrace{A \times \dots \times A}_{k\ \mathrm{times}}) equals the number of paths of length k between i and j.
- Show that the graph is connected if and only if for all (i, j), there exists k \in \mathbf{N} such that (A^k)_{i,j} \neq 0.
- Show that if such a k \in \mathbf{N} exists, then there is also a such k with k \leq n.
Exercise 2.7 A matrix A \in \mathrm{M}_n(\mathbf{K}) is upper triangular if A_{i,j} = 0 for i > j. We denote by \mathrm{T}_n(\mathbf{K}) the set of upper triangular matrices.
- Provide examples of upper triangular matrices and non-upper triangular matrices for n = 3.
- Show that \mathrm{T}_n(\mathbf{K}) is a vector subspace of \mathrm{M}_n(\mathbf{K}).
- Show that if A, B \in \mathrm{T}_n(\mathbf{K}), then AB \in \mathrm{T}_n(\mathbf{K}).
- Provide a basis for the vector space \mathrm{T}_n(\mathbf{K}) and deduce its dimension.