In the last lecture we talked about norms in inner product spaces. The
induced norm was defined as
‖
x
‖
=
⟨
x
,
x
⟩
=
‖
x
‖
2
.
{\displaystyle \lVert \mathbf {x} \rVert ={\sqrt {\langle \mathbf {x} ,\mathbf {x} \rangle }}=\lVert \mathbf {x} \rVert _{2}~.}
We also talked about orthonomal bases and biorthonormal bases. The biorthonormal
bases may be thought of as dual bases in the sense that covariant and
contravariant vector bases are dual.
The last thing we talked about was the idea of a linear operator. Recall that
A
φ
j
=
∑
i
A
i
j
φ
i
≡
A
i
j
φ
i
{\displaystyle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{j}=\sum _{i}A_{ij}~\varphi _{i}\equiv A_{ij}~\varphi _{i}}
where the summation is on the first index.
In this lecture we will learn about adjoint operators, Jacobi tridiagonalization,
and a bit about the spectral theory of matrices.
Assume that we have a vector space with an orthonormal basis. Then
⟨
A
φ
j
,
φ
i
⟩
=
A
k
j
⟨
φ
k
,
φ
i
⟩
=
A
k
j
δ
k
i
=
A
i
j
{\displaystyle \langle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{i}\rangle =A_{kj}~\langle {\boldsymbol {\varphi }}_{k},{\boldsymbol {\varphi }}_{i}\rangle =A_{kj}~\delta _{ki}=A_{ij}}
One specific matrix connected with
A
{\displaystyle {\boldsymbol {A}}}
is the Hermitian conjugate matrix.
This matrix is defined as
A
i
j
∗
=
A
¯
j
i
{\displaystyle A_{ij}^{*}={\overline {A}}_{ji}}
The linear operator
A
∗
{\displaystyle {\boldsymbol {A}}^{*}}
connected with the Hermitian matrix is called the
adjoint operator and is defined as
A
∗
=
A
¯
T
{\displaystyle {\boldsymbol {A}}^{*}={\overline {\boldsymbol {A}}}^{T}}
Therefore,
⟨
A
∗
φ
j
,
φ
i
⟩
=
A
¯
j
i
{\displaystyle \langle {\boldsymbol {A}}^{*}~{\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{i}\rangle ={\overline {A}}_{ji}}
and
⟨
φ
i
,
A
∗
φ
j
⟩
=
A
j
i
=
⟨
A
φ
i
,
φ
j
⟩
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {A}}^{*}~{\boldsymbol {\varphi }}_{j}\rangle =A_{ji}=\langle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle }
More generally, if
f
=
∑
i
α
i
φ
i
and
g
=
∑
j
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{i}\alpha _{i}~{\boldsymbol {\varphi }}_{i}\qquad {\text{and}}\qquad \mathbf {g} =\sum _{j}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
then
⟨
f
,
A
∗
g
⟩
=
∑
i
,
j
α
i
β
¯
j
⟨
φ
i
,
A
∗
φ
j
⟩
=
∑
i
,
j
α
i
β
¯
j
⟨
A
φ
i
,
φ
j
⟩
=
⟨
A
f
,
g
⟩
{\displaystyle \langle \mathbf {f} ,{\boldsymbol {A}}^{*}~\mathbf {g} \rangle =\sum _{i,j}\alpha _{i}~{\overline {\beta }}_{j}\langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {A}}^{*}~{\boldsymbol {\varphi }}_{j}\rangle =\sum _{i,j}\alpha _{i}~{\overline {\beta }}_{j}\langle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle =\langle {\boldsymbol {A}}~\mathbf {f} ,\mathbf {g} \rangle }
Since the above relation does not involve the basis we see that the adjoint operator is also basis independent.
Self-adjoint/Hermitian matrices
edit
If
A
∗
=
A
{\displaystyle {\boldsymbol {A}}^{*}={\boldsymbol {A}}}
we say that
A
{\displaystyle {\boldsymbol {A}}}
is self-adjoint , i.e.,
A
i
j
=
A
¯
j
i
{\displaystyle A_{ij}={\overline {A}}_{ji}}
in any orthonomal basis, and the matrix
A
i
j
{\displaystyle A_{ij}}
is said to be Hermitian.
Anti-Hermitian matrices
edit
A matrix
B
{\displaystyle \mathbf {B} }
is anti-Hermitian if
B
i
j
=
−
B
¯
j
i
{\displaystyle B_{ij}=-{\overline {B}}_{ji}}
There is a close connection between Hermitian and anti-Hermitian matrices.
Consider a matrix
A
=
i
B
{\displaystyle \mathbf {A} =i~\mathbf {B} }
. Then
A
i
j
=
i
B
i
j
=
−
i
B
¯
j
i
=
A
¯
j
i
.
{\displaystyle A_{ij}=i~B_{ij}=-i~{\overline {B}}_{ji}={\overline {A}}_{ji}~.}
Jacobi Tridiagonalization
edit
Let
A
{\displaystyle {\boldsymbol {A}}}
be self-adjoint and suppose that we want to solve
(
I
−
λ
A
)
y
=
b
{\displaystyle ({\boldsymbol {I}}-\lambda ~{\boldsymbol {A}})~\mathbf {y} =\mathbf {b} }
where
λ
{\displaystyle \lambda }
is constant. We expect that
y
=
(
I
−
λ
A
)
−
1
b
{\displaystyle \mathbf {y} =({\boldsymbol {I}}-\lambda ~{\boldsymbol {A}})^{-1}~\mathbf {b} }
If
λ
a
{\displaystyle \lambda ~\mathbf {a} }
is "sufficiently" small, then
y
=
(
I
+
λ
A
+
λ
2
A
2
+
λ
3
A
3
+
…
)
b
{\displaystyle \mathbf {y} =({\boldsymbol {I}}+\lambda ~{\boldsymbol {A}}+\lambda ^{2}~{\boldsymbol {A}}^{2}+\lambda ^{3}~{\boldsymbol {A}}^{3}+\dots )~\mathbf {b} }
This suggest that the solution should be in the subspace spanned by
b
,
A
b
,
A
2
b
,
…
{\displaystyle \mathbf {b} ,{\boldsymbol {A}}~\mathbf {b} ,{\boldsymbol {A}}^{2}~\mathbf {b} ,\dots }
.
Let us apply the Gram-Schmidt orthogonalization procedure where
x
1
=
b
,
x
2
=
A
b
,
x
3
=
A
2
b
,
…
{\displaystyle \mathbf {x} _{1}=\mathbf {b} ,~\mathbf {x} _{2}={\boldsymbol {A}}~\mathbf {b} ,~\mathbf {x} _{3}={\boldsymbol {A}}^{2}~\mathbf {b} ,~\dots }
Then we have
φ
n
=
A
n
b
−
∑
j
=
1
n
−
1
⟨
A
n
b
,
φ
j
⟩
‖
φ
j
‖
2
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{n}={\boldsymbol {A}}^{n}~\mathbf {b} -\sum _{j=1}^{n-1}{\cfrac {\langle {\boldsymbol {A}}^{n}~\mathbf {b} ,{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{j}}
This is clearly a linear combination of
(
x
1
,
x
2
,
…
,
x
n
)
{\displaystyle (\mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{n})}
.
Therefore,
A
φ
n
{\displaystyle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{n}}
is a linear combination of
A
(
x
1
,
x
2
,
…
,
x
n
)
=
(
x
2
,
x
3
,
…
,
x
n
+
1
)
{\displaystyle {\boldsymbol {A}}~(\mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{n})=(\mathbf {x} _{2},\mathbf {x} _{3},\dots ,\mathbf {x} _{n+1})}
. This
is the same as saying that
A
φ
n
{\displaystyle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{n}}
is a linear combination of
φ
1
,
φ
2
,
…
,
φ
n
+
1
{\displaystyle {\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n+1}}
.
Therefore,
⟨
A
φ
n
,
φ
k
⟩
=
0
if
k
>
n
+
1
{\displaystyle \langle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle =0~~{\text{if}}~k>n+1}
Now,
A
k
n
=
⟨
A
φ
n
,
φ
k
⟩
{\displaystyle A_{kn}=\langle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle }
But the self-adjointeness of
A
{\displaystyle {\boldsymbol {A}}}
implies that
A
n
k
=
A
k
n
¯
{\displaystyle A_{nk}={\overline {A_{kn}}}}
So
A
k
n
=
0
{\displaystyle A_{kn}=0}
is
k
>
n
+
1
{\displaystyle k>n+1}
or
n
>
k
+
1
{\displaystyle n>k+1}
. This is equivalent to expressing
the operator
A
{\displaystyle {\boldsymbol {A}}}
as a tridiagonal matrix
A
{\displaystyle \mathbf {A} }
which has the form
A
=
[
x
x
0
…
…
…
0
x
x
x
⋱
⋮
0
x
x
⋱
⋱
⋮
⋮
⋱
⋱
⋱
⋱
⋱
⋮
⋮
⋱
⋱
x
x
0
⋮
⋱
x
x
x
0
…
…
…
0
x
x
]
{\displaystyle \mathbf {A} ={\begin{bmatrix}x&x&0&\dots &\dots &\dots &0\\x&x&x&\ddots &&&\vdots \\0&x&x&\ddots &\ddots &&\vdots \\\vdots &\ddots &\ddots &\ddots &\ddots &\ddots &\vdots \\\vdots &&\ddots &\ddots &x&x&0\\\vdots &&&\ddots &x&x&x\\0&\dots &\dots &\dots &0&x&x\end{bmatrix}}}
In general, the matrix can be represented in block tridiagonal form.
Another consequence of the Gram-Schmidt orthogonalization is that
Lemma:
Every finite dimensional inner-product space has an orthonormal basis.
Proof:
The proof is trivial. Just use Gram-Schmidt on any basis for that space and
normalize.
◻
{\displaystyle \qquad \qquad \square }
A corollary of this is the following theorem.
Theorem:
Every finite dimensional inner product space is complete .
Recall that a space is complete is the limit of any Cauchy sequence from a
subspace of that space must lie within that subspace.
Proof:
Let
{
u
k
}
{\displaystyle \{\mathbf {u} _{k}\}}
be a Cauchy sequence of elements in the subspace
S
n
{\displaystyle {\mathcal {S}}_{n}}
with
k
=
1
,
…
,
∞
{\displaystyle k=1,\dots ,\infty }
. Also let
{
e
1
,
e
2
,
…
,
e
n
}
{\displaystyle \{\mathbf {e} _{1},\mathbf {e} _{2},\dots ,\mathbf {e} _{n}\}}
be an
orthonormal basis for the subspace
S
n
{\displaystyle {\mathcal {S}}_{n}}
.
Then
u
k
=
∑
j
=
1
n
α
k
j
e
j
{\displaystyle \mathbf {u} _{k}=\sum _{j=1}^{n}\alpha _{kj}~\mathbf {e} _{j}}
where
α
k
i
=
⟨
u
k
,
e
i
⟩
{\displaystyle \alpha _{ki}=\langle \mathbf {u} _{k},\mathbf {e} _{i}\rangle }
By the Schwarz inequality
|
α
k
i
−
α
p
i
|
=
|
⟨
u
k
,
e
i
⟩
−
⟨
u
p
,
e
i
⟩
|
≤
‖
u
k
−
u
p
‖
→
0
{\displaystyle |\alpha _{ki}-\alpha _{pi}|=|\langle \mathbf {u} _{k},\mathbf {e} _{i}\rangle -\langle \mathbf {u} _{p},\mathbf {e} _{i}\rangle |\leq \lVert \mathbf {u} _{k}-\mathbf {u} _{p}\rVert \rightarrow 0}
Therefore,
α
k
i
−
α
p
i
→
0
{\displaystyle \alpha _{ki}-\alpha _{pi}\rightarrow 0}
But the
α
{\displaystyle \alpha }
~s are just numbers. So, for fixed
i
{\displaystyle i}
,
{
α
k
i
}
{\displaystyle \{\alpha _{ki}\}}
is
a Cauchy sequence in
R
{\displaystyle \mathbb {R} }
(or
C
{\displaystyle \mathbb {C} }
) and so converges to
a number
α
i
{\displaystyle \alpha _{i}}
as
k
→
∞
{\displaystyle k\rightarrow \infty }
, i.e.,
lim
k
→
∞
u
k
=
∑
i
=
1
n
α
i
e
i
{\displaystyle \lim _{k\rightarrow \infty }\mathbf {u} _{k}=\sum _{i=1}^{n}\alpha _{i}~\mathbf {e} _{i}}
which is is the subspace
S
n
{\displaystyle {\mathcal {S}}_{n}}
.
◻
{\displaystyle \qquad \qquad \square }
Spectral theory for matrices
edit
Suppose
A
x
=
b
{\displaystyle {\boldsymbol {A}}~\mathbf {x} =\mathbf {b} }
is expressed in coordinates relative to some basis
φ
1
,
φ
2
,
…
,
φ
n
{\displaystyle {\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}}
, i.e.,
A
φ
j
=
∑
i
A
i
j
φ
i
;
x
=
∑
i
x
i
φ
i
;
b
=
∑
i
b
i
φ
i
{\displaystyle {\boldsymbol {A}}~{\boldsymbol {\varphi }}_{j}=\sum _{i}A_{ij}~{\boldsymbol {\varphi }}_{i}~;~~\mathbf {x} =\sum _{i}x_{i}~{\boldsymbol {\varphi }}_{i}~;~~\mathbf {b} =\sum _{i}b_{i}~{\boldsymbol {\varphi }}_{i}}
Then
A
x
=
A
∑
j
x
j
φ
j
=
∑
j
x
j
(
A
φ
j
)
=
∑
i
,
j
x
j
A
i
j
φ
i
{\displaystyle {\boldsymbol {A}}~\mathbf {x} ={\boldsymbol {A}}~\sum _{j}x_{j}~{\boldsymbol {\varphi }}_{j}=\sum _{j}x_{j}~(A~{\boldsymbol {\varphi }}_{j})=\sum _{i,j}x_{j}~A_{ij}~{\boldsymbol {\varphi }}_{i}}
So
A
x
=
b
{\displaystyle {\boldsymbol {A}}~\mathbf {x} =\mathbf {b} }
implies that
∑
i
,
j
A
i
j
x
j
=
b
i
{\displaystyle \sum _{i,j}A_{ij}~x_{j}=b_{i}}
Now let us try to see the effect of a change to basis to a new basis
φ
1
′
,
φ
2
′
,
…
,
φ
n
′
{\displaystyle {\boldsymbol {\varphi }}_{1}^{'},{\boldsymbol {\varphi }}_{2}^{'},\dots ,{\boldsymbol {\varphi }}_{n}^{'}}
with
φ
i
′
=
∑
j
=
1
n
C
j
i
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{i}^{'}=\sum _{j=1}^{n}C_{ji}~{\boldsymbol {\varphi }}_{j}}
For the new basis to be linearly independent,
C
{\displaystyle \mathbf {C} }
should be invertible so
that
φ
i
=
∑
j
=
1
n
C
m
j
−
1
φ
m
′
{\displaystyle {\boldsymbol {\varphi }}_{i}=\sum _{j=1}^{n}C_{mj}^{-1}~{\boldsymbol {\varphi }}_{m}^{'}}
Now,
x
=
∑
j
x
j
φ
j
=
∑
i
,
j
x
j
C
i
j
−
1
φ
i
′
=
∑
i
x
i
′
φ
i
′
{\displaystyle \mathbf {x} =\sum _{j}x_{j}~{\boldsymbol {\varphi }}_{j}=\sum _{i,j}x_{j}~C_{ij}^{-1}~{\boldsymbol {\varphi }}_{i}^{'}=\sum _{i}x_{i}^{'}~{\boldsymbol {\varphi }}_{i}^{'}}
Hence
x
i
′
=
C
i
j
−
1
x
j
{\displaystyle x_{i}^{'}=C_{ij}^{-1}~x_{j}}
Similarly,
b
i
′
=
C
i
j
−
1
b
j
{\displaystyle b_{i}^{'}=C_{ij}^{-1}~b_{j}}
Therefore
A
φ
′
=
∑
j
=
1
n
C
j
i
A
φ
j
=
∑
j
,
k
C
j
i
A
k
j
φ
k
=
∑
j
,
k
,
m
C
j
i
A
k
j
C
m
k
−
1
φ
m
′
=
∑
m
A
m
i
′
φ
m
′
{\displaystyle {\boldsymbol {A}}~{\boldsymbol {\varphi }}^{'}=\sum _{j=1}^{n}C_{ji}~{\boldsymbol {A}}~{\boldsymbol {\varphi }}_{j}=\sum _{j,k}C_{ji}~A_{kj}~{\boldsymbol {\varphi }}_{k}=\sum _{j,k,m}C_{ji}~A_{kj}~C_{mk}^{-1}~{\boldsymbol {\varphi }}_{m}^{'}=\sum _{m}A_{mi}^{'}~{\boldsymbol {\varphi }}_{m}^{'}}
So we have
A
m
i
′
=
∑
j
,
k
C
m
k
−
1
A
k
j
C
j
i
{\displaystyle A_{mi}^{'}=\sum _{j,k}C_{mk}^{-1}~A_{kj}~C_{ji}}
In matrix form,
x
′
=
C
−
1
x
;
b
′
=
C
−
1
b
;
A
′
=
C
−
1
A
C
{\displaystyle \mathbf {x} ^{'}=\mathbf {C} ^{-1}~\mathbf {x} ~;~~\mathbf {b} ^{'}=\mathbf {C} ^{-1}~\mathbf {b} ~;~~\mathbf {A} ^{'}=\mathbf {C} ^{-1}~\mathbf {A} ~\mathbf {C} }
where the objects here are not operators or vectors but rather the matrices and
vectors representing them. They are therefore basis dependent.
In other words, the matrix equation
A
x
=
b
{\displaystyle \mathbf {A} ~\mathbf {x} =\mathbf {b} }
The transformation
A
′
=
C
−
1
A
C
{\displaystyle \mathbf {A} ^{'}=\mathbf {C} ^{-1}~\mathbf {A} ~\mathbf {C} }
is called a similarity transformation . Two matrices are equivalent
or similar is there is a similarity transformation between them.
Diagonalizing a matrix
edit
Suppose we want to find a similarity transformation which makes
A
{\displaystyle {\boldsymbol {A}}}
diagonal,
i.e.,
A
′
=
[
λ
1
0
…
…
0
0
λ
2
0
…
0
⋮
⋮
⋱
⋮
⋮
0
…
…
…
λ
n
]
=
Λ
{\displaystyle \mathbf {A} ^{'}={\begin{bmatrix}\lambda _{1}&0&\dots &\dots &0\\0&\lambda _{2}&0&\dots &0\\\vdots &\vdots &\ddots &\vdots &\vdots \\0&\dots &\dots &\dots &\lambda _{n}\end{bmatrix}}={\boldsymbol {\Lambda }}}
Then,
A
C
=
C
A
′
=
C
Λ
{\displaystyle \mathbf {A} ~\mathbf {C} =\mathbf {C} ~\mathbf {A} ^{'}=\mathbf {C} ~{\boldsymbol {\Lambda }}}
Let us write
C
{\displaystyle \mathbf {C} }
(which is a
n
×
n
{\displaystyle n\times n}
matrix) in terms of its columns
C
=
[
x
1
x
2
…
x
n
]
{\displaystyle \mathbf {C} ={\begin{bmatrix}\mathbf {x} _{1}&\mathbf {x} _{2}&\dots &\mathbf {x} _{n}\end{bmatrix}}}
Then,
A
[
x
1
x
2
…
x
n
]
=
[
x
1
x
2
…
x
n
]
Λ
=
[
λ
1
x
1
λ
2
x
2
…
λ
n
x
n
]
{\displaystyle \mathbf {A} ~{\begin{bmatrix}\mathbf {x} _{1}&\mathbf {x} _{2}&\dots &\mathbf {x} _{n}\end{bmatrix}}={\begin{bmatrix}\mathbf {x} _{1}&\mathbf {x} _{2}&\dots &\mathbf {x} _{n}\end{bmatrix}}~{\boldsymbol {\Lambda }}={\begin{bmatrix}\lambda _{1}~\mathbf {x} _{1}&\lambda _{2}~\mathbf {x} _{2}&\dots &\lambda _{n}~\mathbf {x} _{n}\end{bmatrix}}}
i.e.,
A
x
i
=
λ
i
x
i
{\displaystyle \mathbf {A} ~\mathbf {x} _{i}=\lambda _{i}~\mathbf {x} _{i}}
The pair
(
λ
,
x
)
{\displaystyle (\lambda ,\mathbf {x} )}
is said to be an eigenvalue pair if
A
x
=
λ
x
{\displaystyle \mathbf {A} ~\mathbf {x} =\lambda ~\mathbf {x} }
where
x
{\displaystyle \mathbf {x} }
is an eigenvector and
λ
{\displaystyle \lambda }
is an eigenvalue.
Since
(
A
−
λ
I
)
=
0
{\displaystyle (\mathbf {A} -\lambda ~\mathbf {I} )=\mathbf {0} }
this means that
λ
{\displaystyle \lambda }
is an
eigenvalue if and only if
det
(
A
−
λ
I
)
=
0
{\displaystyle \det(\mathbf {A} -\lambda ~\mathbf {I} )=\mathbf {0} }
The quantity on the left hand side is called the characteristic polynomial
and has
n
{\displaystyle n}
roots (counting multiplicities).
In
C
{\displaystyle \mathbb {C} }
there is always one root. For that root
A
−
λ
I
{\displaystyle \mathbf {A} -\lambda ~\mathbf {I} }
is singular, i.e., there always exists at least one eigenvector.
We will delve a bit more into the spectral theory of matrices in the next
lecture.