Linear algebra (Osnabrück 2024-2025)/Part I/Lecture 24

The Theorem of Cayley-Hamilton

One highlight of the linear algebra is the Theorem of Cayley-Hamilton. In order to formulate this theorem, recall that we can plug in a square matrix into a polynomial, see the 20th lecture. Here, the variable ${}X$ is everywhere replaced by the matrix ${}M$ , the powers ${}M^{i}$ are the ${}i$ -th matrix product of ${}M$ with itself, and the addition is the (componentwise) addition of matrices. A scalar ${}a$ has to be interpreted as the ${}a$ -fold of the identity matrix. For the polynomial

{}P=3X^{2}-5X+2\,

and the matrix

{}M={\begin{pmatrix}2&4\\3&1\end{pmatrix}}\,,

we get

{}{\begin{aligned}P(M)&=3{\begin{pmatrix}2&4\\3&1\end{pmatrix}}^{2}-5{\begin{pmatrix}2&4\\3&1\end{pmatrix}}+2\\&={\begin{pmatrix}3&0\\0&3\end{pmatrix}}{\begin{pmatrix}16&12\\9&13\end{pmatrix}}+{\begin{pmatrix}-5&0\\0&-5\end{pmatrix}}{\begin{pmatrix}2&4\\3&1\end{pmatrix}}+{\begin{pmatrix}2&0\\0&2\end{pmatrix}}\\&={\begin{pmatrix}40&16\\12&36\end{pmatrix}}.\end{aligned}}

For a fixed matrix ${}M\in \operatorname {Mat} _{n}(K)$ , we have the substitution mapping

K[X]\longrightarrow \operatorname {Mat} _{n}(K),P\longmapsto P(M).

This is (like the substitution mapping for an element ${}a\in K$ ), a ring homomorphism, that is, the relations (see also Lemma 20.3 )

(P+Q)(M)=P(M)+Q(M),\,(P\cdot Q)(M)=P(M)\circ Q(M){\text{ and }}1(M)=E_{n}

hold. The Theorem of Cayley-Hamilton answers the question of what happens when we insert a matrix in its characteristic polynomial.

Theorem

Let ${}K$ be a field, and let ${}M$ be an ${}n\times n$ -matrix. Let

{}\chi _{M}=X^{n}+c_{n-1}X^{n-1}+\cdots +c_{1}X+c_{0}\,

denote the characteristic polynomial of ${}M$ . Then

{}\chi _{M}\,(M)=M^{n}+c_{n-1}M^{n-1}+\cdots +c_{1}M+c_{0}=0\,.

This means that the matrix annihilates the characteristic polynomial.

Proof

We consider the matrix ${}XE_{n}-M$ as a matrix whose entries are in the field ${}K(X)$ . The adjugate matrix

(XE_{n}-M)^{\operatorname {adj} }

belongs also to ${}\operatorname {Mat} _{n}(K(X))$ . The entries of the adjugate matrix are by definition the determinants of ${}(n-1)\times (n-1)$ -submatrices of ${}XE_{n}-M$ . In the entries of this matrix, the variable ${}X$ occurs at most in its first power, so that, in the entries of the adjugate matrix, the variable occurs at most in its ${}(n-1)$ -th power. We write

(XE_{n}-M)^{\operatorname {adj} }=X^{n-1}A_{n-1}+X^{n-2}A_{n-2}+\cdots +XA_{1}+A_{0}\,

with matrices

{}A_{i}\in \operatorname {Mat} _{n}(K)\,,

that is, we write the entries as polynomials, and we collect all coefficients referring to ${}X^{i}$ into a matrix. Because of Theorem 17.9 , we have

{}{\begin{aligned}\chi _{M}E_{n}&=(XE_{n}-M)\circ (XE_{n}-M)^{\operatorname {adj} }\\&=(XE_{n}-M)\circ (X^{n-1}A_{n-1}+X^{n-2}A_{n-2}+\cdots +XA_{1}+A_{0})\\&=X^{n}A_{n-1}+X^{n-1}(A_{n-2}-M\circ A_{n-1})+X^{n-2}(A_{n-3}-M\circ A_{n-2})+\cdots +X^{1}(A_{0}-M\circ A_{1})-M\circ A_{0}.\end{aligned}}

We can write the matrix on the left according to the powers of ${}X$ and we get

\chi _{M}E_{n}=X^{n}E_{n}+X^{n-1}c_{n-1}E_{n}+X^{n-2}c_{n-2}E_{n}+\cdots +X^{1}c_{1}E_{n}+c_{0}E_{n}\,.

Since these polynomials coincide, their coefficients coincide. That is, we have a system of equations

{\begin{matrix}E_{n}&=&A_{n-1}\\c_{n-1}E_{n}&=&A_{n-2}-M\circ A_{n-1}\\c_{n-2}E_{n}&=&A_{n-3}-M\circ A_{n-2}\\\vdots &\vdots &\vdots \\c_{1}E_{n}&=&A_{0}-M\circ A_{1}\\c_{0}E_{n}&=&-M\circ A_{0}\,.\end{matrix}}

We multiply these equations from the left from top down with ${}M^{n},M^{n-1},M^{n-2},\ldots ,M^{1},E_{n}$ , yielding the system of equations

{\begin{matrix}M^{n}&=&M^{n}\circ A_{n-1}\\c_{n-1}M^{n-1}&=&M^{n-1}\circ A_{n-2}-M^{n}\circ A_{n-1}\\c_{n-2}M^{n-2}&=&M^{n-2}\circ A_{n-3}-M^{n-1}\circ A_{n-2}\\\vdots &\vdots &\vdots \\c_{1}M^{1}&=&MA_{0}-M^{2}\circ A_{1}\\c_{0}E_{n}&=&-M\circ A_{0}\,.\end{matrix}}

If we add the left-hand side of this system, then we just get ${}\chi _{M}\,(M)$ . If we add the right-hand side, then we get ${}0$ , because every partial summand ${}M^{i+1}\circ A_{i}$ occurs once positively and once negatively. Hence, we have ${}\chi _{M}\,(M)=0$ .

\Box

Theorem

Let ${}V$ be a finite-dimensional vector space over a field ${}K$ , and let

f\colon V\longrightarrow V

denote a linear mapping. Then the characteristic polynomial of ${}f$ fulfills the relation

{}\chi _{f}(f)=0\,.

Proof

This follows immediately from Theorem 24.1 .

\Box

Minimal polynomial and characteristic polynomial

Corollary

Let ${}V$ be a finite-dimensional vector space over a field ${}K$ , and let

f\colon V\longrightarrow V

be a linear mapping. Then the characteristic polynomial ${}\chi _{f}$ is a multiple of the minimal polynomial

{}\mu _{f}

of

{}f

.

Proof

This follows directly from Theorem 24.2 and Corollary 20.12 .

\Box

In particular, the degree of the minimal polynomial of ${}\varphi \colon V\rightarrow V$ is bounded by the dimension of the vector space ${}V$ . The minimal polynomial and the characteristic polynomial are related in several respects, for example, they have the same zeroes.

Lemma

Let ${}V$ be a finite-dimensional vector space over a field ${}K$ , and let

f\colon V\longrightarrow V

be a linear mapping. Let ${}v\in V$ be an eigenvector of ${}f$ with eigenvalue ${}\lambda$ , and let ${}P\in K[X]$ denote a polynomial. Then

{}(P(f))(v)=P(\lambda )v\,.

In particular,

{}v

is an eigenvector of

{}P(f)

with eigenvalue

{}P(\lambda )

. The vector

${}v\neq 0$

belongs to the kernel of

{}P(f)

if and only if

{}\lambda

is a zero of

{}P

.

Proof

We have

{}(f^{k})(v)=\lambda ^{k}v\,.

This implies the statement, since the assignment ${}P\mapsto P(f)$ is compatible with addition and scalar multiplication.

\Box

Lemma

Let ${}V$ be a finite-dimensional vector space over a field ${}K$ , and let

f\colon V\longrightarrow V

be a linear mapping. Then the characteristic polynomial ${}\chi _{f}$ and the minimal polynomial

{}\mu _{f}

have the same zeroes.

Proof

It follows directly from Cayley-Hamilton that the zeroes of the minimal polynomial are also zeroes of the characteristic polynomial.

To prove the other implication, let ${}\lambda \in K$ be a zero of the characteristic polynomial, and let ${}v\in V$ denote an eigenvector of ${}f$ with eigenvalue ${}\lambda$ , its existence is guaranteed by Theorem 23.2 . We write the minimal polynomial as

{}\mu _{f}=(X-\lambda _{1})^{m_{1}}\cdots (X-\lambda _{k})^{m_{k}}Q\,,

where ${}Q$ has no zero. Then

{}{\begin{aligned}0&=\mu _{f}(f)\\&={\left((X-\lambda _{1})^{m_{1}}\cdots (X-\lambda _{k})^{m_{k}}Q\right)}(f)\\&=(f-\lambda _{1}\operatorname {Id} _{V})^{m_{1}}\cdots (f-\lambda _{k}\operatorname {Id} _{V})^{m_{k}}Q(f).\end{aligned}}

We apply this mapping to ${}v$ . Because of Fact *****, the factors send the vector ${}v$ to ${}(\lambda -\lambda _{i})^{m_{i}}v$ or to ${}Q(\lambda )v$ , respectively. Altogether, ${}v$ is sent to

(\lambda -\lambda _{1})^{m_{1}}\cdots (\lambda -\lambda _{k})^{m_{k}}Q(\lambda )v.

As the composed mapping is the zero mapping and ${}Q(\lambda )\neq 0$ , we must have ${}\lambda _{i}=\lambda$ for some ${}i$ .

\Box

Further examples

For the moment, we will apply the following concept only for invertible matrices.

Definition

Let ${}G$ be a group and ${}g\in G$ an element. Then we call the smallest positive number ${}n$ with ${}g^{n}=e_{G}$ the order of ${}g$ . For this, we write ${}\operatorname {ord} \,(g)$ . If all positive powers of ${}g$ are different from the neutral element, then we set

{}\operatorname {ord} \,(g)=\infty

.

We consider linear mappings

\varphi \colon V\longrightarrow V

with the property that some power of it is the identity, say

{}\varphi ^{k}=\operatorname {Id} _{V}\,,

that is, ${}\varphi$ has finite order. Typical examples are rotations around an angle of the form ${}{\frac {360}{k}}$ degree. The polynomial ${}X^{k}-1$ annihilates this endomorphism, and is, therefore, a multiple of the minimal polynomial.

Definition

Let ${}K$ be a field, and ${}n\in \mathbb {N}$ A zero of the polynomial

X^{n}-1

in ${}K$ is called an ${}n$ -th root of unity

in

{}K

.

Lemma

Let ${}n\in \mathbb {N} _{+}$ . The zeroes of the polynomials ${}X^{n}-1$ over ${}\mathbb {C}$ are

e^{2\pi {\mathrm {i} }k/n}=\cos {\frac {2\pi k}{n}}+{\mathrm {i} }\sin {\frac {2\pi k}{n}},k=0,1,\ldots ,n-1.

In

{}\mathbb {C} [X]

, we have the factorization

{}X^{n}-1=(X-1){\left(X-e^{2\pi {\mathrm {i} }/n}\right)}\cdots {\left(X-e^{2\pi {\mathrm {i} }(n-1)/n}\right)}\,.

Proof

The proof uses some basic facts about the complex exponential function. We have

{}{\left(e^{2\pi {\mathrm {i} }k/n}\right)}^{n}=e^{2\pi {\mathrm {i} }k}={\left(e^{2\pi {\mathrm {i} }}\right)}^{k}=1^{k}=1\,.

Hence, the given complex numbers are indeed zeroes of the polynomial ${}X^{n}-1$ . These zeroes are all different, because

{}e^{2\pi {\mathrm {i} }k/n}=e^{2\pi {\mathrm {i} }\ell /n}\,

with ${}0\leq k\leq \ell \leq n-1$ implies, by considering the fraction, ${}e^{2\pi {\mathrm {i} }(\ell -k)/n}=1$ that

{}\ell -k=0\,

holds. Therefore, there exist ${}n$ explicit zeroes and these are all the zeroes of the polynomial. The explicit description in coordinates follows from the Euler's formula.

\Box

Definition

For a permutation ${}\pi$ on ${}\{1,\ldots ,n\}$ , the ${}n\times n$ -matrix

{}M_{\pi }={\left(a_{ij}\right)}\,,

where

{}a_{\pi (j),j}=1\,

and all other entries are ${}0$ , is called a

permutation matrix.

We want to determine the characteristic polynomial of a permutation matrix. Here, we use that a permutation is a product of cycles. For a cycle of the form ${}1\mapsto 2\mapsto 3\mapsto \ldots \mapsto k\mapsto 1$ , the corresponding permutation matrix is

{\begin{pmatrix}0&0&\ldots &0&1\\1&0&0&\ldots &0\\0&1&0&\ldots &0\\\vdots &\ddots &\ddots &\ddots &\vdots \\0&\ldots &0&1&0\end{pmatrix}}.

Every cycle can be brought (by renumbering) into this form.

Lemma

The characteristic polynomial of a permutation matrix ${}M_{\rho }$ for a cycle ${}\rho \in S_{n}$ of order ${}k$ is

{}\chi _{M}=(X-1)^{n-k}(X^{k}-1)\,.

Proof

We may assume that the cycle has the form ${}1\mapsto 2\mapsto 3\mapsto \ldots \mapsto k\mapsto 1$ . The corresponding permutation matrix ${}M_{\rho }$ looks with respect to ${}e_{k+1},\ldots ,e_{n}$ like the identity matrix and has, with respect to the first ${}k$ standard vectors, the form

{\begin{pmatrix}0&0&\ldots &0&1\\1&0&0&\ldots &0\\0&1&0&\ldots &0\\\vdots &\ddots &\ddots &\ddots &\vdots \\0&\ldots &0&1&0\end{pmatrix}}.

The determinant of ${}XE_{n}-M_{\rho }$ is ${}(X-1)^{n-k}$ multiplied with the determinant of

{\begin{pmatrix}X&0&\ldots &0&-1\\-1&X&0&\ldots &0\\0&-1&X&\ldots &0\\\vdots &\ddots &\ddots &\ddots &\vdots \\0&\ldots &0&-1&X\end{pmatrix}}.

The expansion with respect to the first row yields

{}X^{k}+(-1)^{k+1}(-1)(-1)^{k-1}=X^{k}-1\,.

\Box

Lemma

For a permutation matrix ${}M_{\rho }$ over ${}\mathbb {C}$ for a cycle ${}\rho \in S_{n}$ with ${}\rho :i_{1}\mapsto i_{2}\mapsto \ldots \mapsto i_{k}\mapsto i_{1}$ and a ${}k$ -th root of unity ${}\zeta$ , the vectors

{}v_{\zeta }:=\zeta ^{k-1}e_{i_{1}}+\zeta ^{k-2}e_{i_{2}}+\cdots +\zeta e_{i_{k-1}}+e_{i_{k}}\,

are eigenvectors of ${}M_{\rho }$ for the eigenvalue ${}\zeta$ . In particular, a permutation matrix of a cycle over ${}\mathbb {C}$ is

diagonalizable.

Proof

We have

{}{\begin{aligned}M_{\rho }(v_{\zeta })&=M_{\rho }(\zeta ^{k-1}e_{i_{1}}+\zeta ^{k-2}e_{i_{2}}+\cdots +\zeta e_{i_{k-1}}+e_{i_{k}})\\&=\zeta ^{k-1}M_{\rho }(e_{i_{1}})+\zeta ^{k-2}M_{\rho }(e_{i_{2}})+\cdots +\zeta M_{\rho }(e_{i_{k-1}})+M_{\rho }(e_{i_{k}})\\&=\zeta ^{k-1}e_{i_{2}}+\zeta ^{k-2}e_{i_{3}}+\cdots +\zeta e_{i_{k}}+e_{i_{1}}\\&=e_{i_{1}}+\zeta ^{k-1}e_{i_{2}}+\zeta ^{k-2}e_{i_{3}}+\cdots +\zeta e_{i_{k}}\\&=\zeta (\zeta ^{k-1}e_{i_{1}}+\zeta ^{k-2}e_{i_{2}}+\cdots +\zeta e_{i_{k-1}}+e_{i_{k}})\\&=\zeta v_{\zeta }.\end{aligned}}

Since there are ${}k$ different ${}k$ -th roots of unity in ${}\mathbb {C}$ , these vectors are linearly independent due to Lemma 22.3 , and they generate a ${}k$ -dimensional linear subspace ${}U$ of ${}K^{n}$ . In fact, we have

{}U=\langle e_{i_{j}},\,j=1,\ldots ,k\rangle \,.

Since the vectors ${}e_{i}$ , ${}i\neq {i_{j}}$ , are fixed vectors, the ${}v_{\zeta }$ together with the ${}e_{i}$ , ${}i\neq {i_{j}}$ , form a basis consisting of eigenvectors of ${}M_{\rho }$ . Hence, ${}M_{\rho }$ is diagonalizable.

\Box

Theorem

A permutation matrix over ${}\mathbb {C}$ is

diagonalizable.

Proof

See Exercise 24.23 .

\Box

<< \| Linear algebra (Osnabrück 2024-2025)/Part I \| >> PDF-version of this lecture Exercise sheet for this lecture (PDF)