Inner product spaces have
L
p
{\displaystyle L_{p}}
norms which are defined as
‖
x
‖
p
=
⟨
x
,
x
⟩
1
/
p
,
p
=
1
,
2
,
…
∞
{\displaystyle \lVert \mathbf {x} \rVert _{p}=\langle \mathbf {x} ,\mathbf {x} \rangle ^{1/p},\quad p=1,2,\dots \infty }
When
p
=
1
{\displaystyle p=1}
, we get the
L
1
{\displaystyle L_{1}}
norm
‖
x
‖
1
=
⟨
x
,
x
⟩
{\displaystyle \lVert \mathbf {x} \rVert _{1}=\langle \mathbf {x} ,\mathbf {x} \rangle }
When
p
=
2
{\displaystyle p=2}
, we get the
L
2
{\displaystyle L_{2}}
norm
‖
x
‖
2
=
⟨
x
,
x
⟩
{\displaystyle \lVert \mathbf {x} \rVert _{2}={\sqrt {\langle \mathbf {x} ,\mathbf {x} \rangle }}}
In the limit as
p
→
∞
{\displaystyle p\rightarrow \infty }
we get the
L
∞
{\displaystyle L_{\infty }}
norm or the sup norm
‖
x
‖
∞
=
m
a
x
|
x
k
|
{\displaystyle \lVert \mathbf {x} \rVert _{\infty }=max|x_{k}|}
The adjacent figure shows a geometric interpretation of the three norms.
Geomtric interpretation of various norms
If a vector space has an inner product then the norm
‖
x
‖
=
⟨
x
,
x
⟩
=
‖
x
‖
2
{\displaystyle \lVert \mathbf {x} \rVert ={\sqrt {\langle \mathbf {x} ,\mathbf {x} \rangle }}=\lVert \mathbf {x} \rVert _{2}}
is called the induced norm . Clearly, the induced norm is nonnegative and zero only
if
x
=
0
{\displaystyle \mathbf {x} =\mathbf {0} }
. It is also linear under multiplication by a positive vector. You can
think of the induced norm as a measure of length for the vector space.
So useful results that follow from the definition of the norm are discussed below.
In an inner product space
|
⟨
x
,
y
⟩
|
≤
‖
x
‖
‖
y
‖
{\displaystyle |\langle \mathbf {x} ,\mathbf {y} \rangle |\leq \lVert \mathbf {x} \rVert ~\lVert \mathbf {y} \rVert }
Proof
This statement is true if
y
=
0
{\displaystyle \mathbf {y} =\mathbf {0} }
.
If
y
≠
0
{\displaystyle \mathbf {y} \neq \mathbf {0} }
we have
0
<
‖
x
−
α
y
‖
2
=
⟨
(
x
−
α
y
)
,
(
x
−
α
y
)
⟩
=
⟨
x
,
x
⟩
−
⟨
x
,
α
y
⟩
−
⟨
α
y
,
x
⟩
+
|
α
2
|
⟨
y
,
y
⟩
{\displaystyle 0<\lVert \mathbf {x} -\alpha ~\mathbf {y} \rVert ^{2}=\langle (\mathbf {x} -\alpha ~\mathbf {y} ),(\mathbf {x} -\alpha ~\mathbf {y} )\rangle =\langle \mathbf {x} ,\mathbf {x} \rangle -\langle \mathbf {x} ,\alpha ~\mathbf {y} \rangle -\langle \alpha ~\mathbf {y} ,\mathbf {x} \rangle +|\alpha ^{2}|~\langle \mathbf {y} ,\mathbf {y} \rangle }
Now
⟨
x
,
α
y
⟩
+
⟨
α
y
,
x
⟩
=
α
¯
⟨
x
,
y
⟩
+
α
⟨
x
,
y
⟩
=
2
Re
(
α
)
⟨
x
,
y
⟩
{\displaystyle \langle \mathbf {x} ,\alpha ~\mathbf {y} \rangle +\langle \alpha ~\mathbf {y} ,\mathbf {x} \rangle ={\overline {\alpha }}~\langle \mathbf {x} ,\mathbf {y} \rangle +\alpha ~\langle \mathbf {x} ,\mathbf {y} \rangle =2~{\text{Re}}(\alpha )~\langle \mathbf {x} ,\mathbf {y} \rangle }
Therefore,
‖
x
‖
2
−
2
Re
(
α
)
⟨
x
,
y
⟩
+
|
α
2
|
‖
y
‖
2
>
0
{\displaystyle \lVert \mathbf {x} \rVert ^{2}-2~{\text{Re}}(\alpha )\langle \mathbf {x} ,\mathbf {y} \rangle +|\alpha ^{2}|~\lVert \mathbf {y} \rVert ^{2}>0}
Let us choose
α
{\displaystyle \alpha }
such that it minimizes the left hand side above. This value is
clearly
α
=
⟨
x
,
y
⟩
‖
y
‖
2
{\displaystyle \alpha ={\cfrac {\langle \mathbf {x} ,\mathbf {y} \rangle }{\lVert \mathbf {y} \rVert ^{2}}}}
which gives us
‖
x
‖
2
−
2
|
⟨
x
,
y
⟩
|
2
‖
y
‖
2
+
|
⟨
x
,
y
⟩
|
2
‖
y
‖
2
>
0
{\displaystyle \lVert \mathbf {x} \rVert ^{2}-2~{\cfrac {|\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}}{\lVert \mathbf {y} \rVert ^{2}}}+{\cfrac {|\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}}{\lVert \mathbf {y} \rVert ^{2}}}>0}
Therefore,
‖
x
‖
2
‖
y
‖
2
≥
|
⟨
x
,
y
⟩
|
2
◻
{\displaystyle \lVert \mathbf {x} \rVert ^{2}~\lVert \mathbf {y} \rVert ^{2}\geq |\langle \mathbf {x} ,\mathbf {y} \rangle |^{2}\qquad \square }
The triangle inequality states that
‖
x
+
y
‖
≤
‖
x
‖
+
‖
y
‖
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert \leq \lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert }
Proof
‖
x
+
y
‖
2
=
‖
x
‖
2
+
2
Re
⟨
x
,
y
⟩
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+2{\text{Re}}\langle \mathbf {x} ,\mathbf {y} \rangle +\lVert \mathbf {y} \rVert ^{2}}
From the Schwarz inequality
‖
x
+
y
‖
2
<
‖
x
‖
2
+
2
‖
x
‖
‖
y
‖
+
‖
y
‖
2
=
(
‖
x
‖
+
‖
y
‖
)
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}<\lVert \mathbf {x} \rVert ^{2}+2\lVert \mathbf {x} \rVert \lVert \mathbf {y} \rVert +\lVert \mathbf {y} \rVert ^{2}=(\lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert )^{2}}
Hence
‖
x
+
y
‖
≤
‖
x
‖
+
‖
y
‖
◻
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert \leq \lVert \mathbf {x} \rVert +\lVert \mathbf {y} \rVert \qquad \square }
Angle between two vectors
edit
In
R
2
{\displaystyle \mathbb {R} ^{2}}
or
R
3
{\displaystyle \mathbb {R} ^{3}}
we have
cos
θ
=
⟨
x
,
y
⟩
‖
x
‖
‖
y
‖
{\displaystyle \cos \theta ={\cfrac {\langle \mathbf {x} ,\mathbf {y} \rangle }{\lVert \mathbf {x} \rVert \lVert \mathbf {y} \rVert }}}
So it makes sense to define
cos
θ
{\displaystyle \cos \theta }
in this way for any real vector space.
We then have
‖
x
+
y
‖
2
=
‖
x
‖
2
+
2
‖
x
‖
‖
y
‖
cos
θ
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+2~\lVert \mathbf {x} \rVert ~\lVert \mathbf {y} \rVert \cos \theta +\lVert \mathbf {y} \rVert ^{2}}
In particular, if
cos
θ
=
0
{\displaystyle \cos \theta =0}
we have an analog of the Pythagoras theorem.
‖
x
+
y
‖
2
=
‖
x
‖
2
+
‖
y
‖
2
{\displaystyle \lVert \mathbf {x} +\mathbf {y} \rVert ^{2}=\lVert \mathbf {x} \rVert ^{2}+\lVert \mathbf {y} \rVert ^{2}}
In that case the vectors are said to be orthogonal.
If
⟨
x
,
y
⟩
=
0
{\displaystyle \langle \mathbf {x} ,\mathbf {y} \rangle =0}
then the vectors are said to be orthogonal even in a complex
vector space.
Orthogonal vectors have a lot of nice properties.
Linear independence of orthogonal vectors
edit
A set of nonzero orthogonal vectors is linearly independent .
If the vectors
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
are linearly dependent
α
1
φ
1
+
α
2
φ
2
+
⋯
+
α
n
φ
n
=
0
{\displaystyle \alpha _{1}~{\boldsymbol {\varphi }}_{1}+\alpha _{2}~{\boldsymbol {\varphi }}_{2}+\dots +\alpha _{n}~{\boldsymbol {\varphi }}_{n}=0}
and the
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
are orthogonal, then taking an inner product with
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
gives
α
j
⟨
φ
j
,
φ
j
⟩
=
0
⟹
α
j
=
0
∀
j
{\displaystyle \alpha _{j}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{j}\rangle =0\quad \implies \quad \alpha _{j}=0~\forall j}
since
⟨
φ
i
,
φ
j
⟩
=
0
if
i
≠
j
.
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle =0\quad {\text{if}}~i\neq j~.}
Therefore the only nontrivial case is that the vectors are linearly independent.
Expressing a vector in terms of an orthogonal basis
edit
If we have a basis
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
and wish to express a
vector
f
{\displaystyle \mathbf {f} }
in terms of it we have
f
=
∑
j
=
1
n
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
The problem is to find the
β
j
{\displaystyle \beta _{j}}
s.
If we take the inner product with respect to
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
, we get
⟨
f
,
φ
i
⟩
=
∑
j
=
1
n
β
j
⟨
φ
i
,
φ
j
⟩
{\displaystyle \langle \mathbf {f} ,{\boldsymbol {\varphi }}_{i}\rangle =\sum _{j=1}^{n}\beta _{j}~\langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle }
In matrix form,
η
=
B
β
{\displaystyle {\boldsymbol {\eta }}={\boldsymbol {B}}~{\boldsymbol {\beta }}}
where
B
i
j
=
⟨
φ
i
,
φ
j
⟩
{\displaystyle B_{ij}=\langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle }
and
η
i
=
⟨
f
,
φ
i
⟩
{\displaystyle \eta _{i}=\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{i}\rangle }
.
Generally, getting the
β
j
{\displaystyle \beta _{j}}
s involves inverting the
n
×
n
{\displaystyle n\times n}
matrix
B
{\displaystyle {\boldsymbol {B}}}
, which is an identity matrix
I
n
{\displaystyle {\boldsymbol {I_{n}}}}
, because
⟨
φ
i
,
φ
j
⟩
=
δ
i
j
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\varphi }}_{j}\rangle ={\boldsymbol {\delta }}_{ij}}
, where
δ
i
j
{\displaystyle {\boldsymbol {\delta }}_{ij}}
is the Kronecker delta.
Provided that the
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
s are orthogonal then we have
β
j
=
⟨
f
,
φ
j
⟩
‖
φ
j
‖
2
{\displaystyle \beta _{j}={\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}}
and the quantity
p
=
⟨
f
,
φ
j
⟩
‖
φ
j
‖
2
φ
j
{\displaystyle \mathbf {p} ={\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{j}}
is called the projection of
f
{\displaystyle \mathbf {f} }
onto
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
.
Therefore the sum
f
=
∑
j
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
says that
f
{\displaystyle \mathbf {f} }
is just a sum of its projections onto the orthogonal basis.
Projection operation.
Let us check whether
p
{\displaystyle \mathbf {p} }
is actually a projection. Let
a
=
f
−
p
=
f
−
⟨
f
,
φ
⟩
‖
φ
‖
2
φ
{\displaystyle \mathbf {a} =\mathbf {f} -\mathbf {p} =\mathbf {f} -{\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle }{\lVert {\boldsymbol {\varphi }}\rVert ^{2}}}~{\boldsymbol {\varphi }}}
Then,
⟨
a
,
φ
⟩
=
⟨
f
,
φ
⟩
−
⟨
f
,
φ
⟩
‖
φ
‖
2
⟨
φ
,
φ
⟩
=
0
{\displaystyle \langle \mathbf {a} ,{\boldsymbol {\varphi }}\rangle =\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle -{\cfrac {\langle \mathbf {f} ,{\boldsymbol {\varphi }}\rangle }{\lVert {\boldsymbol {\varphi }}\rVert ^{2}}}~\langle {\boldsymbol {\varphi }},{\boldsymbol {\varphi }}\rangle =0}
Therefore
a
{\displaystyle \mathbf {a} }
and
φ
{\displaystyle {\boldsymbol {\varphi }}}
are indeed orthogonal.
Note that we can normalize
φ
i
{\displaystyle {\boldsymbol {\varphi }}_{i}}
by defining
φ
~
i
=
φ
i
‖
φ
i
‖
{\displaystyle {\tilde {\boldsymbol {\varphi }}}_{i}={\cfrac {{\boldsymbol {\varphi }}_{i}}{\lVert {\boldsymbol {\varphi }}_{i}\rVert }}}
Then the basis
{
φ
~
1
,
φ
~
2
,
…
,
φ
~
n
}
{\displaystyle \{{\tilde {\boldsymbol {\varphi }}}_{1},{\tilde {\boldsymbol {\varphi }}}_{2},\dots ,{\tilde {\boldsymbol {\varphi }}}_{n}\}}
is called an orthonormal basis .
It follows from the equation for
β
j
{\displaystyle \beta _{j}}
that
β
~
j
=
⟨
f
,
φ
⟩
j
~
{\displaystyle {\tilde {\beta }}_{j}=\langle \mathbf {f} ,{\tilde {{\boldsymbol {\varphi }}\rangle _{j}}}}
and
f
=
∑
j
=
1
n
β
~
j
φ
~
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}{\tilde {\beta }}_{j}~{\tilde {\boldsymbol {\varphi }}}_{j}}
You can think of the vectors
φ
~
i
{\displaystyle {\tilde {\boldsymbol {\varphi }}}_{i}}
as orthogonal unit vectors in
an
n
{\displaystyle n}
-dimensional space.
However, using an orthogonal basis is not the only way to do things. An alternative
that is useful (for instance when using wavelets) is the biorthonormal basis .
The problem in this case is converted into one where, given any basis
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
, we want to find another set
of vectors
{
ψ
1
,
ψ
2
,
…
,
ψ
n
}
{\displaystyle \{{\boldsymbol {\psi }}_{1},{\boldsymbol {\psi }}_{2},\dots ,{\boldsymbol {\psi }}_{n}\}}
such that
⟨
φ
i
,
ψ
j
⟩
=
δ
i
j
{\displaystyle \langle {\boldsymbol {\varphi }}_{i},{\boldsymbol {\psi }}_{j}\rangle =\delta _{ij}}
In that case, if
f
=
∑
j
=
1
n
β
j
φ
j
{\displaystyle \mathbf {f} =\sum _{j=1}^{n}\beta _{j}~{\boldsymbol {\varphi }}_{j}}
it follows that
⟨
f
,
ψ
k
⟩
=
∑
j
=
1
n
β
j
⟨
φ
j
,
ψ
k
⟩
=
β
k
{\displaystyle \langle \mathbf {f} ,{\boldsymbol {\psi }}_{k}\rangle =\sum _{j=1}^{n}\beta _{j}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\psi }}_{k}\rangle =\beta _{k}}
So the coefficients
β
k
{\displaystyle \beta _{k}}
can easily be recovered. You can see a schematic
of the two sets of vectors in the adjacent figure.
Biorthonomal basis
Gram-Schmidt orthogonalization
edit
One technique for getting an orthogonal baisis is to use the process of
Gram-Schmidt orthogonalization .
The goal is to produce an orthogonal set of vectors
{
φ
1
,
φ
2
,
…
,
φ
n
}
{\displaystyle \{{\boldsymbol {\varphi }}_{1},{\boldsymbol {\varphi }}_{2},\dots ,{\boldsymbol {\varphi }}_{n}\}}
given a linearly independent set
{
x
1
,
x
2
,
…
,
x
n
}
{\displaystyle \{\mathbf {x} _{1},\mathbf {x} _{2},\dots ,\mathbf {x} _{n}\}}
.
We start of by assuming that
φ
1
=
x
1
{\displaystyle {\boldsymbol {\varphi }}_{1}=\mathbf {x} _{1}}
. Then
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{2}}
is given by
subtracting the projection of
x
2
{\displaystyle \mathbf {x} _{2}}
onto
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{1}}
from
x
2
{\displaystyle \mathbf {x} _{2}}
, i.e.,
φ
2
=
x
2
−
⟨
x
2
,
φ
1
⟩
‖
φ
1
‖
2
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{2}=\mathbf {x} _{2}-{\cfrac {\langle \mathbf {x} _{2},{\boldsymbol {\varphi }}_{1}\rangle }{\lVert {\boldsymbol {\varphi }}_{1}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{1}}
Thus
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{2}}
is clearly orthogonal to
φ
1
{\displaystyle {\boldsymbol {\varphi }}_{1}}
. For
φ
3
{\displaystyle {\boldsymbol {\varphi }}_{3}}
we use
φ
3
=
x
3
−
⟨
x
3
,
φ
1
⟩
‖
φ
1
‖
2
φ
1
−
⟨
x
3
,
φ
2
⟩
‖
φ
2
‖
2
φ
2
{\displaystyle {\boldsymbol {\varphi }}_{3}=\mathbf {x} _{3}-{\cfrac {\langle \mathbf {x} _{3},{\boldsymbol {\varphi }}_{1}\rangle }{\lVert {\boldsymbol {\varphi }}_{1}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{1}-{\cfrac {\langle \mathbf {x} _{3},{\boldsymbol {\varphi }}_{2}\rangle }{\lVert {\boldsymbol {\varphi }}_{2}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{2}}
More generally,
φ
n
=
x
n
−
∑
j
=
1
n
−
1
⟨
x
n
,
φ
j
⟩
‖
φ
j
‖
2
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{n}=\mathbf {x} _{n}-\sum _{j=1}^{n-1}{\cfrac {\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~{\boldsymbol {\varphi }}_{j}}
If you want an orthonormal set then you can do that by normalizing the orthogonal
set of vectors.
We can check that the vectors
φ
j
{\displaystyle {\boldsymbol {\varphi }}_{j}}
are indeed orthogonal by induction.
Assume that all
φ
j
,
j
≤
n
−
1
{\displaystyle {\boldsymbol {\varphi }}_{j},~j\leq n-1}
are orthogonal for some
j
{\displaystyle j}
. Pick
k
<
n
{\displaystyle k<n}
. Then
⟨
φ
n
,
φ
k
⟩
=
⟨
x
n
,
φ
k
⟩
−
∑
j
=
1
n
−
1
⟨
x
n
,
φ
j
⟩
‖
φ
j
‖
2
⟨
φ
j
,
φ
k
⟩
{\displaystyle \langle {\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle =\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{k}\rangle -\sum _{j=1}^{n-1}{\cfrac {\langle \mathbf {x} _{n},{\boldsymbol {\varphi }}_{j}\rangle }{\lVert {\boldsymbol {\varphi }}_{j}\rVert ^{2}}}~\langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{k}\rangle }
Now
⟨
φ
j
,
φ
k
⟩
=
0
{\displaystyle \langle {\boldsymbol {\varphi }}_{j},{\boldsymbol {\varphi }}_{k}\rangle =0}
unless
j
=
k
{\displaystyle j=k}
. However, at
j
=
k
{\displaystyle j=k}
,
⟨
φ
n
,
φ
k
⟩
=
0
{\displaystyle \langle {\boldsymbol {\varphi }}_{n},{\boldsymbol {\varphi }}_{k}\rangle =0}
because the two remaining terms cancel out. Hence
the vectors are orthogonal.
Note that you have to be careful while numerically computing an orthogonal basis
using the Gram-Schmidt technique because the errors add up in the terms under the sum.