In the course of the previous lecture we essentially proved the following theorem:
1) If a
n
×
n
{\displaystyle n\times n}
matrix
A
{\displaystyle \mathbf {A} }
has
n
{\displaystyle n}
linearly independent real or complex
eigenvectors, the
A
{\displaystyle \mathbf {A} }
can be diagonalized.
2) If
T
{\displaystyle \mathbf {T} }
is a matrix whose columns are eigenvectors then
T
A
T
−
1
=
Λ
{\displaystyle \mathbf {T} \mathbf {A} \mathbf {T} ^{-1}={\boldsymbol {\Lambda }}}
is the diagonal matrix of eigenvalues.
The factorization
A
=
T
−
1
Λ
T
{\displaystyle \mathbf {A} =\mathbf {T} ^{-1}{\boldsymbol {\Lambda }}\mathbf {T} }
is called the spectral representation
of
A
{\displaystyle \mathbf {A} }
.
We can use the spectral representation to solve a system of linear homogeneous ordinary
differential equations.
For example, we could wish to solve the system
d
u
d
t
=
A
u
=
[
−
2
1
1
−
2
]
[
u
1
u
2
]
{\displaystyle {\cfrac {d\mathbf {u} }{dt}}=\mathbf {A} \mathbf {u} ={\begin{bmatrix}-2&1\\1&-2\end{bmatrix}}{\begin{bmatrix}u_{1}\\u_{2}\end{bmatrix}}}
(More generally
A
{\displaystyle \mathbf {A} }
could be a
n
×
n
{\displaystyle n\times n}
matrix.)
Higher order ordinary differential equations can be reduced to this form. For example,
d
2
u
1
d
t
2
+
a
d
u
1
d
t
=
b
u
1
{\displaystyle {\cfrac {d^{2}u_{1}}{dt^{2}}}+a~{\cfrac {du_{1}}{dt}}=b~u_{1}}
Introduce
u
2
=
d
u
1
d
t
{\displaystyle u_{2}={\cfrac {du_{1}}{dt}}}
Then the system of equations is
d
u
1
d
t
=
u
2
d
u
2
d
t
=
b
u
1
−
a
u
2
{\displaystyle {\begin{aligned}{\cfrac {du_{1}}{dt}}&=u_{2}\\{\cfrac {du_{2}}{dt}}&=b~u_{1}-a~u_{2}\end{aligned}}}
or,
d
u
d
t
=
[
0
1
b
−
a
]
[
u
1
u
2
]
=
A
u
{\displaystyle {\cfrac {d\mathbf {u} }{dt}}={\begin{bmatrix}0&1\\b&-a\end{bmatrix}}{\begin{bmatrix}u_{1}\\u_{2}\end{bmatrix}}=\mathbf {A} \mathbf {u} }
Returning to the original problem, let us find the eigenvalues and eigenvectors of
A
{\displaystyle \mathbf {A} }
. The characteristic equation is
d
e
t
(
A
−
λ
I
)
=
0
{\displaystyle det(\mathbf {A} -\lambda ~\mathbf {I} )=0}
o we can calculate the eigenvalues as
(
2
+
λ
)
(
2
+
λ
)
−
1
=
0
⟹
λ
2
+
4
λ
+
3
=
0
⟹
λ
1
=
−
1
,
λ
2
=
−
3
{\displaystyle (2+\lambda )(2+\lambda )-1=0\quad \implies \quad \lambda ^{2}+4\lambda +3=0\qquad \implies \qquad \lambda _{1}=-1,\qquad \lambda _{2}=-3}
The eigenvectors are given by
(
A
−
λ
1
I
)
n
1
=
0
;
(
A
−
λ
2
I
)
n
2
=
0
{\displaystyle (\mathbf {A} -\lambda _{1}~\mathbf {I} )\mathbf {n} _{1}=\mathbf {0} ~;~~(\mathbf {A} -\lambda _{2}~\mathbf {I} )\mathbf {n} _{2}=\mathbf {0} }
or,
−
n
1
1
+
n
2
1
=
0
;
n
1
1
−
n
2
1
=
0
;
n
1
2
+
n
2
2
=
0
;
n
1
2
+
n
2
2
=
0
{\displaystyle -n_{1}^{1}+n_{2}^{1}=0~;~~n_{1}^{1}-n_{2}^{1}=0~;~~n_{1}^{2}+n_{2}^{2}=0~;~~n_{1}^{2}+n_{2}^{2}=0}
Possible choices of
n
1
{\displaystyle \mathbf {n} _{1}}
and
n
2
{\displaystyle \mathbf {n} _{2}}
are
n
1
=
[
1
1
]
;
n
2
=
[
1
−
1
]
{\displaystyle \mathbf {n} _{1}={\begin{bmatrix}1\\1\end{bmatrix}}~;~~\mathbf {n} _{2}={\begin{bmatrix}1\\-1\end{bmatrix}}}
The matrix
T
{\displaystyle \mathbf {T} }
is one whose columd are the eigenvectors of
A
{\displaystyle \mathbf {A} }
, i.e.,
T
=
[
1
1
1
−
1
]
{\displaystyle \mathbf {T} ={\begin{bmatrix}1&1\\1&-1\end{bmatrix}}}
and
Λ
=
T
−
1
A
T
=
[
−
1
0
0
−
3
]
{\displaystyle {\boldsymbol {\Lambda }}=\mathbf {T} ^{-1}\mathbf {A} \mathbf {T} ={\begin{bmatrix}-1&0\\0&-3\end{bmatrix}}}
If
u
=
T
u
′
{\displaystyle \mathbf {u} =\mathbf {T} \mathbf {u} ^{'}}
the system of equations becomes
d
u
′
d
t
=
T
−
1
A
T
u
′
=
Λ
u
′
{\displaystyle {\cfrac {d\mathbf {u} ^{'}}{dt}}=\mathbf {T} ^{-1}\mathbf {A} \mathbf {T} \mathbf {u} ^{'}={\boldsymbol {\Lambda }}~\mathbf {u} ^{'}}
Expanded out
d
u
1
′
d
t
=
−
u
1
′
;
d
u
2
′
d
t
=
−
3
u
2
′
{\displaystyle {\cfrac {du_{1}^{'}}{dt}}=-u_{1}^{'}~;~~{\cfrac {du_{2}^{'}}{dt}}=-3~u_{2}^{'}}
The solutions of these equations are
u
1
′
=
C
1
e
−
t
;
u
2
′
=
C
2
e
−
3
t
{\displaystyle u_{1}^{'}=C_{1}~e^{-t}~;~~u_{2}^{'}=C_{2}~e^{-3t}}
Therefore,
u
=
T
u
′
=
[
C
1
e
−
t
+
C
2
e
−
3
t
C
1
e
−
t
−
C
2
e
−
3
t
]
{\displaystyle \mathbf {u} =\mathbf {T} ~\mathbf {u} ^{'}={\begin{bmatrix}C_{1}~e^{-t}+C_{2}~e^{-3t}\\C_{1}~e^{-t}-C_{2}~e^{-3t}\end{bmatrix}}}
This is the solution of the system of ODEs that we seek.
Most "generic" matrices have linearly independent eigenvectors. Generally a matrix will
have
n
{\displaystyle n}
distinct eigenvalues unless there are symmetries that lead to repeated values.
If
A
{\displaystyle \mathbf {A} }
has
k
{\displaystyle k}
distinct eigenvalues then it has
k
{\displaystyle k}
linearly independent eigenvectors.
Proof:
We prove this by induction.
Let
n
j
{\displaystyle \mathbf {n} _{j}}
be the eigenvector corresponding to the eigenvalue
λ
j
{\displaystyle \lambda _{j}}
. Suppose
n
1
,
n
2
,
…
,
n
k
−
1
{\displaystyle \mathbf {n} _{1},\mathbf {n} _{2},\dots ,\mathbf {n} _{k-1}}
are linearly independent (note that this is true for
k
{\displaystyle k}
= 2). The question then becomes: Do there exist
α
1
,
α
2
,
…
,
α
k
{\displaystyle \alpha _{1},\alpha _{2},\dots ,\alpha _{k}}
not all zero such that the linear combination
α
1
n
1
+
α
2
n
2
+
⋯
+
α
k
n
k
=
0
{\displaystyle \alpha _{1}~\mathbf {n} _{1}+\alpha _{2}~\mathbf {n} _{2}+\dots +\alpha _{k}~\mathbf {n} _{k}=0}
Let us multiply the above by
(
A
−
λ
k
I
)
{\displaystyle (\mathbf {A} -\lambda _{k}~\mathbf {I} )}
. Then, since
A
n
i
=
λ
i
n
i
{\displaystyle \mathbf {A} ~\mathbf {n} _{i}=\lambda _{i}~\mathbf {n} _{i}}
, we have
α
1
(
λ
1
−
λ
k
)
n
1
+
α
2
(
λ
2
−
λ
k
)
n
2
+
⋯
+
α
k
−
1
(
λ
k
−
1
−
λ
k
)
n
k
−
1
+
α
k
(
λ
k
−
λ
k
)
n
k
=
0
{\displaystyle \alpha _{1}~(\lambda _{1}-\lambda _{k})~\mathbf {n} _{1}+\alpha _{2}~(\lambda _{2}-\lambda _{k})~\mathbf {n} _{2}+\dots +\alpha _{k-1}~(\lambda _{k-1}-\lambda _{k})~\mathbf {n} _{k-1}+\alpha _{k}~(\lambda _{k}-\lambda _{k})~\mathbf {n} _{k}=\mathbf {0} }
Since
λ
k
{\displaystyle \lambda _{k}}
is arbitrary, the above is true only when
α
1
=
α
2
=
⋯
=
α
k
−
1
=
0
{\displaystyle \alpha _{1}=\alpha _{2}=\dots =\alpha _{k-1}=0}
In thast case we must have
α
k
n
k
=
0
⟹
α
k
=
0
{\displaystyle \alpha _{k}~\mathbf {n} _{k}=\mathbf {0} \quad \implies \quad \alpha _{k}=0}
This leads to a contradiction.
Therefore
n
1
,
n
2
,
…
,
n
k
{\displaystyle \mathbf {n} _{1},\mathbf {n} _{2},\dots ,\mathbf {n} _{k}}
are linearly independent.
◻
{\displaystyle \qquad \square }
Another important class of matrices which are diagonalizable are those which are self-adjoint.
If
A
{\displaystyle {\boldsymbol {A}}}
is self-adjoint the following statements are true
⟨
A
x
,
x
⟩
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle }
is real for all
x
{\displaystyle \mathbf {x} }
.
All eigenvalues are real.
Eigenvectors of distinct eigenvalues are orthogonal.
There is an orthonormal basis formed by the eigenvectors.
The matrix
A
{\displaystyle {\boldsymbol {A}}}
can be diagonalized (this is a consequence of the previous statement.)
Proof
1) Because the matrix is self-adjoint we have
⟨
A
x
,
x
⟩
=
⟨
x
,
A
x
⟩
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle =\langle \mathbf {x} ,{\boldsymbol {A}}\mathbf {x} \rangle }
From the property of the inner product we have
⟨
x
,
A
x
⟩
=
⟨
A
x
,
x
⟩
¯
{\displaystyle \langle \mathbf {x} ,{\boldsymbol {A}}\mathbf {x} \rangle ={\overline {\langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle }}}
Therefore,
⟨
A
x
,
x
⟩
=
⟨
A
x
,
x
⟩
¯
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle ={\overline {\langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle }}}
which implies that
⟨
A
x
,
x
⟩
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle }
is real.
2) Since
⟨
A
x
,
x
⟩
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle }
is real,
⟨
I
x
,
x
⟩
=
⟨
x
,
x
⟩
{\displaystyle \langle {\boldsymbol {I}}\mathbf {x} ,\mathbf {x} \rangle =\langle \mathbf {x} ,\mathbf {x} \rangle }
is real.
Also, from the eiegnevalue problem, we have
⟨
A
x
,
x
⟩
=
λ
⟨
x
,
x
⟩
{\displaystyle \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} \rangle =\lambda ~\langle \mathbf {x} ,\mathbf {x} \rangle }
Therefore,
λ
{\displaystyle \lambda }
is real.
3) If
(
λ
,
x
)
{\displaystyle (\lambda ,\mathbf {x} )}
and
(
μ
,
y
)
{\displaystyle (\mu ,\mathbf {y} )}
are two eigenpairs then
λ
⟨
x
,
y
⟩
=
⟨
A
x
,
y
⟩
{\displaystyle \lambda ~\langle \mathbf {x} ,\mathbf {y} \rangle =\langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {y} \rangle }
Since the matrix is self-adjoint, we have
λ
⟨
x
,
y
⟩
=
⟨
x
,
A
y
⟩
=
μ
⟨
x
,
y
⟩
{\displaystyle \lambda ~\langle \mathbf {x} ,\mathbf {y} \rangle =\langle \mathbf {x} ,{\boldsymbol {A}}\mathbf {y} \rangle =\mu ~\langle \mathbf {x} ,\mathbf {y} \rangle }
Therefore, if
λ
≠
μ
≠
0
{\displaystyle \lambda \neq \mu \neq 0}
, we must have
⟨
x
,
y
⟩
=
0
{\displaystyle \langle \mathbf {x} ,\mathbf {y} \rangle =0}
Hence the eigenvectors are orthogonal.
4) This part is a bit more involved. We need to define a manifold first.
A linear manifold (or vector subspace)
M
∈
S
{\displaystyle {\mathcal {M}}\in {\mathcal {S}}}
is a subset of
S
{\displaystyle {\mathcal {S}}}
which is
closed under scalar multiplication and vector addition.
Examples are a line through the origin of
n
{\displaystyle n}
-dimensional space, a plane through the origin,
the whole space, the zero vector, etc.
An invariant manifold
M
{\displaystyle {\mathcal {M}}}
for the matrix
A
{\displaystyle {\boldsymbol {A}}}
is the linear manifold for which
x
∈
M
{\displaystyle \mathbf {x} \in {\mathcal {M}}}
implies
A
x
∈
M
{\displaystyle {\boldsymbol {A}}\mathbf {x} \in {\mathcal {M}}}
.
Examples are the null space and range of a matrix
A
{\displaystyle {\boldsymbol {A}}}
. For the case of a rotation
about an axis through the origin in
n
{\displaystyle n}
-space, invaraiant manifolds are the origin, the
plane perpendicular to the axis, the whole space, and the axis itself.
Therefore, if
x
1
,
x
2
,
…
,
x
m
{\displaystyle \mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{m}}
are a basis for
M
{\displaystyle {\mathcal {M}}}
and
x
m
+
1
,
…
,
x
n
{\displaystyle \mathbf {x} _{m+1},\dots ,\mathbf {x} _{n}}
are a basis for
M
⊥
{\displaystyle {\mathcal {M}}_{\perp }}
(the perpendicular component of
M
{\displaystyle {\mathcal {M}}}
) then in this basis
A
{\displaystyle {\boldsymbol {A}}}
has the representation
A
=
[
x
x
|
x
x
x
x
|
x
x
−
−
−
−
−
0
0
|
x
x
0
0
|
x
x
]
{\displaystyle {\boldsymbol {A}}={\begin{bmatrix}x&x&|&x&x\\x&x&|&x&x\\-&-&-&-&-\\0&0&|&x&x\\0&0&|&x&x\end{bmatrix}}}
We need a matrix of this form for it to be in an invariant manifold for
A
{\displaystyle {\boldsymbol {A}}}
.
Note that if
M
{\displaystyle {\mathcal {M}}}
is an invariant manifold of
A
{\displaystyle {\boldsymbol {A}}}
it does not follow that
M
⊥
{\displaystyle {\mathcal {M}}_{\perp }}
is also an invariant manifold.
Now, if
A
{\displaystyle {\boldsymbol {A}}}
is self adjoint then the entries in the off-diagonal spots must be zero too.
In that case,
A
{\displaystyle {\boldsymbol {A}}}
is block diagonal in this basis.
Getting back to part (4), we know that there exists at least one eigenpair (
λ
1
,
x
1
{\displaystyle \lambda _{1},\mathbf {x} _{1}}
)
(this is true for any matrix). We now use induction. Suppose that we have found (
n
−
1
{\displaystyle n-1}
)
mutually orthogonal eigenvectors
x
i
{\displaystyle \mathbf {x} _{i}}
with
A
x
i
=
λ
i
x
i
{\displaystyle {\boldsymbol {A}}\mathbf {x} _{i}=\lambda _{i}\mathbf {x} _{i}}
and
λ
i
{\displaystyle \lambda _{i}}
are real,
i
=
1
,
…
,
k
−
1
{\displaystyle i=1,\dots ,k-1}
. Note that the
x
i
{\displaystyle \mathbf {x} _{i}}
s are invariant manifolds of
A
{\displaystyle {\boldsymbol {A}}}
as
is the space spanned by the
x
i
{\displaystyle \mathbf {x} _{i}}
s and so is the manifold perpendicular to these vectors).
We form the linear manifold
M
k
=
{
x
|
⟨
x
,
x
j
⟩
=
0
j
=
1
,
2
,
…
,
k
−
1
}
{\displaystyle {\mathcal {M}}_{k}=\{\mathbf {x} |\langle \mathbf {x} ,\mathbf {x} _{j}\rangle =0~~j=1,2,\dots ,k-1\}}
This is the orthogonal component of the
k
−
1
{\displaystyle k-1}
eigenvectors
x
1
,
x
2
,
…
,
x
k
−
1
{\displaystyle \mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{k-1}}
If
x
∈
M
k
{\displaystyle \mathbf {x} \in {\mathcal {M}}_{k}}
then
⟨
x
,
x
j
⟩
=
0
and
⟨
A
x
,
x
j
⟩
=
⟨
x
,
A
x
j
⟩
=
λ
j
⟨
x
,
x
j
⟩
=
0
{\displaystyle \langle \mathbf {x} ,\mathbf {x} _{j}\rangle =0\quad {\text{and}}\quad \langle {\boldsymbol {A}}\mathbf {x} ,\mathbf {x} _{j}\rangle =\langle \mathbf {x} ,{\boldsymbol {A}}\mathbf {x} _{j}\rangle =\lambda _{j}\langle \mathbf {x} ,\mathbf {x} _{j}\rangle =0}
Therefore
A
x
∈
M
k
{\displaystyle {\boldsymbol {A}}\mathbf {x} \in {\mathcal {M}}_{k}}
which means that
M
k
{\displaystyle {\mathcal {M}}_{k}}
is invariant.
Hence
M
k
{\displaystyle {\mathcal {M}}_{k}}
contains at least one eigenvector
x
k
{\displaystyle \mathbf {x} _{k}}
with real eigenvalue
λ
k
{\displaystyle \lambda _{k}}
.
We can repeat the procedure to get a diagonal matrix in the lower block of the block
diagonal representation of
A
{\displaystyle {\boldsymbol {A}}}
. We then get
n
{\displaystyle n}
distinct eigenvectors and so
A
{\displaystyle {\boldsymbol {A}}}
can
be diagonalized. This implies that the eigenvectors form an orthonormal basis.
5) This follows from the previous result because each eigenvector can be normalized so
that
⟨
x
i
,
x
j
⟩
=
δ
i
j
{\displaystyle \langle \mathbf {x} _{i},\mathbf {x} _{j}\rangle =\delta _{ij}}
.
We will explore some more of these ideas in the next lecture.