Inner product spaces have
L
p
{\displaystyle L_{p}}
norms which are defined as
‖
x
‖
p
=
⟨
x
,
x
⟩
1
/
p
,
p
=
1
,
2
,
…
∞
{\displaystyle \lVert \mathbf {x} \rVert _{p}=\langle \mathbf {x} ,\mathbf {x} \rangle ^{1/p},\quad p=1,2,\dots \infty }
When
p
=
1
{\displaystyle p=1}
, we get the
L
1
{\displaystyle L_{1}}
norm
‖
x
‖
1
=
⟨
x
,
x
⟩
{\displaystyle \lVert \mathbf {x} \rVert _{1}=\langle \mathbf {x} ,\mathbf {x} \rangle }
When
p
=
2
{\displaystyle p=2}
, we get the
L
2
{\displaystyle L_{2}}
norm
‖
x
‖
2
=
⟨
x
,
x
⟩
{\displaystyle \lVert \mathbf {x} \rVert _{2}={\sqrt {\langle \mathbf {x} ,\mathbf {x} \rangle }}}
In the limit as
p
→
∞
{\displaystyle p\rightarrow \infty }
we get the
L
∞
{\displaystyle L_{\infty }}
norm or the sup norm
‖
x
‖
∞
=
m
a
x
|
x
k
|
{\displaystyle \lVert \mathbf {x} \rVert _{\infty }=max|x_{k}|}
The adjacent figure shows a geometric interpretation of the three norms.
Geomtric interpretation of various norms
If a vector space has an inner product then the norm
‖
x
‖
=
⟨
x
,
x
⟩
=
‖
x
‖
2
{\displaystyle \lVert \mathbf {x} \rVert ={\sqrt {\langle \mathbf {x} ,\mathbf {x} \rangle }}=\lVert \mathbf {x} \rVert _{2}}
is called the induced norm . Clearly, the induced norm is nonnegative and zero only
if
x
=
0
{\displaystyle \mathbf {x} =\mathbf {0} }
. It is also linear under multiplication by a positive vector. You can
think of the induced norm as a measure of length for the vector space.
So useful results that follow from the definition of the norm are discussed below.
Schwarz inequality
edit
In an inner product space
|
⟨
x
,
y
⟩
|
≤
‖
x
‖
‖
y
‖
{\displaystyle |\langle \mathbf {x} ,\mathbf {y} \rangle |\leq \lVert \mathbf {x} \rVert ~\lVert \mathbf {y} \rVert }
Proof
This statement is true if
y
=
0
{\displaystyle \mathbf {y} =\mathbf {0} }
.
If
y
≠
0
{\displaystyle \mathbf {y} \neq \mathbf {0} }
we have
0
<
‖
x
−
α
y
‖
2
=
⟨
(
x
−
α
y
)
,
(
x
−
α
y
)
⟩
=
⟨
x
,
x
⟩
−
⟨
x
,
α
y
⟩
−
⟨
α
y
,
x
⟩
+
|
α
2
|
⟨
y
,
y
⟩
{\displaystyle 0<\lVert \mathbf {x} -\alpha ~\mathbf {y} \rVert ^{2}=\langle (\mathbf {x} -\alpha ~\mathbf {y} ),(\mathbf {x} -\alpha ~\mathbf {y} )\rangle =\langle \mathbf {x} ,\mathbf {x} \rangle -\langle \mathbf {x} ,\alpha ~\mathbf {y} \rangle -\langle \alpha ~\mathbf {y} ,\mathbf {x} \rangle +|\alpha ^{2}|~\langle \mathbf {y} ,\mathbf {y} \rangle }
Now
⟨
x
,
α
y
⟩
+
⟨
α
y
,
x
⟩
=
α
¯
⟨
x
,
y
⟩
+
α
⟨
x
,
y
⟩
=
2
Re
(
α
)
⟨
x
,
y
⟩
{\displaystyle \langle \mathbf {x} ,\alpha ~\mathbf {y} \rangle +\langle \alpha ~\mathbf {y} ,\mathbf {x} \rangle ={\overline {\alpha }}~\langle \mathbf {x} ,\mathbf {y} \rangle +\alpha ~\langle \mathbf {x} ,\mathbf {y} \rangle =2~{\text{Re}}(\alpha )~\langle \mathbf {x} ,\mathbf {y} \rangle }
Therefore,
‖
x
‖
2
−
2
Re
(
α
)
⟨
x
,
y
⟩
+
|
α
2
|
‖
y
‖
2
>
0
{\displaystyle \lVert \mathbf {x} \rVert ^{2}-2~{\text{Re}}(\alpha )\langle \mathbf {x} ,\mathbf {y} \rangle +|\alpha ^{2}|~\lVert \mathbf {y} \rVert ^{2}>0}
Let us choose
α
{\displaystyle \alpha }
such that it minimizes the left hand side above. This value is
clearly
α
=
⟨
x
,
y
⟩
‖
y
‖
2
{\displaystyle \alpha ={\cfrac {\langle \mathbf {x} ,\mathbf {y} \rangle }{\lVert \mathbf {y} \rVert ^{2}}}}
which gives us
‖
x
‖
2
−
2
|
⟨
x
,
y
⟩
|
2
‖
y
‖
2
+
|
⟨
x
,
y
⟩
|
2
‖
y
‖
2
>
0
{\displaystyle \lVert \mathbf {x} \rVert ^{2}-2~{\cfrac {|\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}}{\lVert \mathbf {y} \rVert ^{2}}}+{\cfrac {|\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}}{\lVert \mathbf {y} \rVert ^{2}}}>0}
Therefore,
‖
x
‖
2
‖
y
‖
2
≥
|
⟨
x
,
y
⟩
|
2
◻
{\displaystyle \lVert \mathbf {x} \rVert ^{2}~\lVert \mathbf {y} \rVert ^{2}\geq |\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}\qquad \square }
Triangle inequality
edit
The triangle inequality states that
‖
x
+
y
‖
≤
‖
x
‖
+
‖
y
‖
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert \leq \lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert }
Proof
‖
x
+
y
‖
2
=
‖
x
‖
2
+
2
Re
⟨
x
,
y
⟩
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+2{\text{Re}}\langle \mathbf {x} ,\mathbf {y} \rangle +\lVert \mathbf {y} \rVert ^{2}}
From the Schwarz inequality
‖
x
+
y
‖
2
<
‖
x
‖
2
+
2
‖
x
‖
‖
y
‖
+
‖
y
‖
2
=
(
‖
x
‖
+
‖
y
‖
)
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}<\lVert \mathbf {x} \rVert ^{2}+2\lVert \mathbf {x} \rVert \lVert \mathbf {y} \rVert +\lVert \mathbf {y} \rVert ^{2}=(\lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert )^{2}}
Hence
‖
x
+
y
‖
≤
‖
x
‖
+
‖
y
‖
◻
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert \leq \lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert \qquad \square }
Angle between two vectors
edit
In
R
2
{\displaystyle \mathbb {R} ^{2}}
or
R
3
{\displaystyle \mathbb {R} ^{3}}
we have
cos
θ
=
⟨
x
,
y
⟩
‖
x
‖
‖
y
‖
{\displaystyle \cos \theta ={\cfrac {\langle \mathbf {x} ,\mathbf {y} \rangle }{\lVert \mathbf {x} \rVert \lVert \mathbf {y} \rVert }}}
So it makes sense to define
cos
θ
{\displaystyle \cos \theta }
in this way for any real vector space.
We then have
‖
x
+
y
‖
2
=
‖
x
‖
2
+
2
‖
x
‖
‖
y
‖
cos
θ
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+2~\lVert \mathbf {x} \rVert ~\lVert \mathbf {y} \rVert \cos \theta +\lVert \mathbf {y} \rVert ^{2}}
Orthogonality
edit
In particular, if
cos
θ
=
0
{\displaystyle \cos \theta =0}
we have an analog of the Pythagoras theorem.
‖
x
+
y
‖
2
=
‖
x
‖
2
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+\lVert \mathbf {y} \rVert ^{2}}
In that case the vectors are said to be orthogonal.
If
⟨
x
,
y
⟩
=
0
{\displaystyle \langle \mathbf {x} ,\mathbf {y} \rangle =0}
then the vectors are said to be orthogonal even in a complex
vector space.
Orthogonal vectors have a lot of nice properties.
Linear independence of orthogonal vectors
edit
A set of nonzero orthogonal vectors is linearly independent . If the vectors
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
are linearly dependent
α
1
φ
1
+
α
2
φ
2
+
⋯
+
α
n
φ
n
=
0
{\displaystyle \alpha _{1}~{\boldsymbol {\varphi }}_{1}+\alpha _{2}~{\boldsymbol {\varphi }}_{2}+\dots +\alpha _{n}~{\boldsymbol {\varphi }}_{n}=0}
and the
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
are orthogonal, then taking an inner product with
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
gives
α
j
⟨
φ
j
,
φ
j
⟩
=
0
⟹
α
j
=
0
∀
j
{\displaystyle \alpha _{j}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{j}\rangle =0\quad \implies \quad \alpha _{j}=0~\forall j}
since
⟨
φ
i
,
φ
j
⟩
=
0
if
i
≠
j
.
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle =0\quad {\text{if}}~i\neq j~.}
Therefore the only nontrivial case is that the vectors are linearly independent.
Expressing a vector in terms of an orthogonal basis
edit
If we have a basis
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
and wish to express a
vector
f
{\displaystyle \mathbf {f} }
in terms of it we have
f
=
∑
j
=
1
n
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
The problem is to find the
β
j
{\displaystyle \beta _{j}}
s.
If we take the inner product with respect to
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
, we get
⟨
f
,
φ
i
⟩
=
∑
j
=
1
n
β
j
⟨
φ
i
,
φ
j
⟩
{\displaystyle \langle \mathbf {f} ,{\boldsymbol {\varphi }}_{i}\rangle =\sum _{j=1}^{n}\beta _{j}~\langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle }
In matrix form,
η
=
B
β
{\displaystyle {\boldsymbol {\eta }}={\boldsymbol {B}}~{\boldsymbol {\beta }}}
where
B
i
j
=
⟨
φ
i
,
φ
j
⟩
{\displaystyle B_{ij}=\langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle }
and
η
i
=
⟨
f
,
φ
i
⟩
{\displaystyle \eta _{i}=\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{i}\rangle }
.
Generally, getting the
β
j
{\displaystyle \beta _{j}}
s involves inverting the
n
×
n
{\displaystyle n\times n}
matrix
B
{\displaystyle {\boldsymbol {B}}}
, which is an identity matrix
I
n
{\displaystyle {\boldsymbol {I_{n}}}}
, because
⟨
φ
i
,
φ
j
⟩
=
δ
i
j
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle ={\boldsymbol {\delta }}_{ij}}
, where
δ
i
j
{\displaystyle {\boldsymbol {\delta }}_{ij}}
is the Kronecker delta.
Provided that the
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
s are orthogonal then we have
β
j
=
⟨
f
,
φ
j
⟩
‖
φ
j
‖
2
{\displaystyle \beta _{j}={\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}}
and the quantity
p
=
⟨
f
,
φ
j
⟩
‖
φ
j
‖
2
φ
j
{\displaystyle \mathbf {p} ={\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{j}}
is called the projection of
f
{\displaystyle \mathbf {f} }
onto
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
.
Therefore the sum
f
=
∑
j
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
says that
f
{\displaystyle \mathbf {f} }
is just a sum of its projections onto the orthogonal basis.
Projection operation.
Let us check whether
p
{\displaystyle \mathbf {p} }
is actually a projection. Let
a
=
f
−
p
=
f
−
⟨
f
,
φ
⟩
‖
φ
‖
2
φ
{\displaystyle \mathbf {a} =\mathbf {f} -\mathbf {p} =\mathbf {f} -{\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle }{\lVert {\boldsymbol {\varphi }}\rVert ^{2}}}~{\boldsymbol {\varphi }}}
Then,
⟨
a
,
φ
⟩
=
⟨
f
,
φ
⟩
−
⟨
f
,
φ
⟩
‖
φ
‖
2
⟨
φ
,
φ
⟩
=
0
{\displaystyle \langle \mathbf {a} ,{\boldsymbol {\varphi }}\rangle =\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle -{\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle }{\lVert {\boldsymbol {\varphi }}\rVert ^{2}}}~\langle {\boldsymbol {\varphi }},{\boldsymbol {\varphi }}\rangle =0}
Therefore
a
{\displaystyle \mathbf {a} }
and
φ
{\displaystyle {\boldsymbol {\varphi }}}
are indeed orthogonal.
Note that we can normalize
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
by defining
φ
~
i
=
φ
i
‖
φ
i
‖
{\displaystyle {\tilde {\boldsymbol {\varphi }}}_{i}={\cfrac {{\boldsymbol {\varphi }}_{i}}{\lVert {\boldsymbol {\varphi }}_{i}\rVert }}}
Then the basis
{
φ
~
1
,
φ
~
2
,
…
,
φ
~
n
}
{\displaystyle \{{\tilde {\boldsymbol {\varphi }}}_{1},{\tilde {\boldsymbol {\varphi }}}_{2},\dots ,{\tilde {\boldsymbol {\varphi }}}_{n}\}}
is called an orthonormal basis .
It follows from the equation for
β
j
{\displaystyle \beta _{j}}
that
β
~
j
=
⟨
f
,
φ
⟩
j
~
{\displaystyle {\tilde {\beta }}_{j}=\langle \mathbf {f} ,{\tilde {{\boldsymbol {\varphi }}\rangle _{j}}}}
and
f
=
∑
j
=
1
n
β
~
j
φ
~
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}{\tilde {\beta }}_{j}~{\tilde {\boldsymbol {\varphi }}}_{j}}
You can think of the vectors
φ
~
i
{\displaystyle {\tilde {\boldsymbol {\varphi }}}_{i}}
as orthogonal unit vectors in
an
n
{\displaystyle n}
-dimensional space.
Biorthogonal basis
edit
However, using an orthogonal basis is not the only way to do things. An alternative
that is useful (for instance when using wavelets) is the biorthonormal basis .
The problem in this case is converted into one where, given any basis
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
, we want to find another set
of vectors
{
ψ
1
,
ψ
2
,
…
,
ψ
n
}
{\displaystyle \{{\boldsymbol {\psi }}_{1},{\boldsymbol {\psi }}_{2},\dots ,{\boldsymbol {\psi }}_{n}\}}
such that
⟨
φ
i
,
ψ
j
⟩
=
δ
i
j
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\psi }}_{j}\rangle =\delta _{ij}}
In that case, if
f
=
∑
j
=
1
n
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
it follows that
⟨
f
,
ψ
k
⟩
=
∑
j
=
1
n
β
j
⟨
φ
j
,
ψ
k
⟩
=
β
k
{\displaystyle \langle \mathbf {f} ,{\boldsymbol {\psi }}_{k}\rangle =\sum _{j=1}^{n}\beta _{j}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\psi }}_{k}\rangle =\beta _{k}}
So the coefficients
β
k
{\displaystyle \beta _{k}}
can easily be recovered. You can see a schematic
of the two sets of vectors in the adjacent figure.
Biorthonomal basis
Gram-Schmidt orthogonalization
edit
One technique for getting an orthogonal baisis is to use the process of
Gram-Schmidt orthogonalization .
The goal is to produce an orthogonal set of vectors
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
given a linearly independent set
{
x
1
,
x
2
,
…
,
x
n
}
{\displaystyle \{\mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{n}\}}
.
We start of by assuming that
φ
1
=
x
1
{\displaystyle {\boldsymbol {\varphi }}_{1}=\mathbf {x} _{1}}
. Then
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{2}}
is given by
subtracting the projection of
x
2
{\displaystyle \mathbf {x} _{2}}
onto
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{1}}
from
x
2
{\displaystyle \mathbf {x} _{2}}
, i.e.,
φ
2
=
x
2
−
⟨
x
2
,
φ
1
⟩
‖
φ
1
‖
2
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{2}=\mathbf {x} _{2}-{\cfrac {\langle \mathbf {x} _{2},{\boldsymbol {\varphi }}_{1}\rangle }{\lVert {\boldsymbol {\varphi }}_{1}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{1}}
Thus
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{2}}
is clearly orthogonal to
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{1}}
. For
φ
3
{\displaystyle {\boldsymbol {\varphi }}_{3}}
we use
φ
3
=
x
3
−
⟨
x
3
,
φ
1
⟩
‖
φ
1
‖
2
φ
1
−
⟨
x
3
,
φ
2
⟩
‖
φ
2
‖
2
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{3}=\mathbf {x} _{3}-{\cfrac {\langle \mathbf {x} _{3},{\boldsymbol {\varphi }}_{1}\rangle }{\lVert {\boldsymbol {\varphi }}_{1}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{1}-{\cfrac {\langle \mathbf {x} _{3},{\boldsymbol {\varphi }}_{2}\rangle }{\lVert {\boldsymbol {\varphi }}_{2}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{2}}
More generally,
φ
n
=
x
n
−
∑
j
=
1
n
−
1
⟨
x
n
,
φ
j
⟩
‖
φ
j
‖
2
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{n}=\mathbf {x} _{n}-\sum _{j=1}^{n-1}{\cfrac {\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{j}}
If you want an orthonormal set then you can do that by normalizing the orthogonal
set of vectors.
We can check that the vectors
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
are indeed orthogonal by induction.
Assume that all
φ
j
,
j
≤
n
−
1
{\displaystyle {\boldsymbol {\varphi }}_{j},~j\leq n-1}
are orthogonal for some
j
{\displaystyle j}
. Pick
k
<
n
{\displaystyle k<n}
. Then
⟨
φ
n
,
φ
k
⟩
=
⟨
x
n
,
φ
k
⟩
−
∑
j
=
1
n
−
1
⟨
x
n
,
φ
j
⟩
‖
φ
j
‖
2
⟨
φ
j
,
φ
k
⟩
{\displaystyle \langle {\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle =\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{k}\rangle -\sum _{j=1}^{n-1}{\cfrac {\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{k}\rangle }
Now
⟨
φ
j
,
φ
k
⟩
=
0
{\displaystyle \langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{k}\rangle =0}
unless
j
=
k
{\displaystyle j=k}
. However, at
j
=
k
{\displaystyle j=k}
,
⟨
φ
n
,
φ
k
⟩
=
0
{\displaystyle \langle {\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle =0}
because the two remaining terms cancel out. Hence
the vectors are orthogonal.
Note that you have to be careful while numerically computing an orthogonal basis
using the Gram-Schmidt technique because the errors add up in the terms under the sum.