# Least-Squares Method

## Content summary

A brief introduction to Least-Squares method, and its statistic meaning.

## Goals

This learning project offers learning activities and some application for Least-Squares Method. With this project, one should understand the intention of Least-Squares Method, and what it means. Moreover, one should be able to apply some simple Least-Squares methods to find a good approximation for any functions. For more mathematical explanation, one should visit the following page: "Least squares" to obtain more information.

## Learning materials

### Texts

[1] Numerical Mathematics and Computing Chapter 12.1

[2] Numerical Method for Engineers: With Software and Programming Applications Chapter 17.3

[3] Statistics for Management and Economics Chapter 17.1

[4] T.Strutz: Data Fitting and Uncertainty. A practical introduction to weighted least squares and beyond. 2nd edition, Springer Vieweg, 2016, ISBN 978-3-658-11455-8.

### Lessons

#### Lesson 1: Introduction to Least-Squares Method

The goal of Least-Squares Method is to find a good estimation of parameters that fit a function, f(x), of a set of data, ${\displaystyle x_{1}...x_{n}}$ . The Least-Squares Method requires that the estimated function has to deviate as little as possible from f(x) in the sense of a 2-norm. Generally speaking, Least-Squares Method has two categories, linear and non-linear. We can also classify these methods further: ordinary least squares (OLS), weighted least squares (WLS), and alternating least squares (ALS) and partial least squares (PLS).

To fit a set of data best, the least-squares method minimizes the sum of squared residuals (it is also called the Sum of Squared Errors, SSE.)

${\displaystyle S=\sum _{i=1}^{i=n}r_{i}^{2}}$ ,

with, ${\displaystyle r_{i}}$ , the residual, which is the difference between the actual points and the regression line, and is defined as

${\displaystyle r_{i}=y_{i}-f(x_{i})}$

where the m data pairs are ${\displaystyle (x_{i},y_{i})\!}$ , and the model function is ${\displaystyle f(x_{i})}$ .

At here, we can choose n different parameters for f(x), so that the approximated function can best fit the data set.

For example, in the right graph, R3 = Y3 - f(X3), and ${\displaystyle (R1^{2}+R2^{2}+R3^{2}+R4^{2}+R5^{2})}$ , the sum of the square of each red line's length, is what we want to minimize.

#### Lesson 2: Linear Least-Squares Method

Linear Least-Squares (LLS) Method assumes that the data set falls on a straight line. Therefore, ${\displaystyle f(x)=ax+b}$ , where a and b are constants. However, due to experimental error, some data might not be on the line exactly. There must be error (residual) between the estimated function and real data. Linear Least-Squares Method (or ${\displaystyle l_{2}}$  approximation) defined the best-fit function as the function that minimizes ${\displaystyle S=\sum _{i=1}^{i=n}(y_{i}-(ax_{i}+b))^{2}}$

The advantages of LLS:

1. If we assume that the errors have a normal probability distribution, then minimizing S gives us the best approximation of a and b.

2. We can easily use calculus to determine the approximated value of a and b.

To minimize S, the following conditions must be satisfied ${\displaystyle {\frac {\partial S}{\partial a}}=0}$ , and ${\displaystyle {\frac {\partial S}{\partial b}}=0}$

Taking the partial derivatives, we obtain ${\displaystyle \sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))*x_{i}=0}$ , and ${\displaystyle \sum _{i=0}^{i=n}(2*((ax_{i}+b)-y_{i}))=0}$ .

This system actually consists of two simultaneous linear equations with two unknowns a and b. (These two equations are so-called normal equations.)

Based on the simple calculation on summation, we can easily find out that

${\displaystyle a={\frac {1}{c}}*[(n+1)*\sum _{i=0}^{i=n}(x_{i}*y_{i})-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(y_{i}))]}$  and ${\displaystyle b={\frac {1}{c}}*[(\sum _{i=0}^{i=n}((x_{i})^{2}))*(\sum _{i=0}^{i=n}(y_{i}))-(\sum _{i=0}^{i=n}(x_{i}))(\sum _{i=0}^{i=n}(x_{i}*y_{i}))]}$

where

${\displaystyle c=(n+1)*(\sum _{i=0}^{i=n}((x_{i})^{2}))-(\sum _{i=0}^{i=n}(x_{i}))^{2}}$  .

Thus, the best estimated function for data set ${\displaystyle (i,y_{i})}$ , for i is an integer between [1, n], is

${\displaystyle y=ax+b}$ , where ${\displaystyle a={\frac {1}{n^{3}-n}}*[12*\sum _{i=1}^{i=n}(i*y_{i})-6*(n+1)(\sum _{i=1}^{i=n}(y_{i}))]}$  and ${\displaystyle b={\frac {1}{n^{2}-n}}*[(4*n+2)*\sum _{i=1}^{i=n}(y_{i})-6*(\sum _{i=1}^{i=n}(i*y_{i}))]}$ .

#### Lesson 3: Linear Least-Squares Method in matrix form

We can also represent estimated linear function in the following model: ${\displaystyle y_{i}=a_{1}*x_{1}+a_{2}*x_{2}+...+a_{m}*x_{m}+r}$ .

It can be also represented in the matrix form: ${\displaystyle {Y}=[X]{A}+{R}}$ , where [X] is a matrix containing coefficients that are derived from the data set (It might not be a square matrix based on the number of variables (m), and data point (n).); Vector ${\displaystyle {Y}}$  contains the value of dependent variable, which is ${\displaystyle {Y}^{T}={\begin{bmatrix}y_{1}&y_{2}&...&y_{n}\end{bmatrix}}}$ ; Vector ${\displaystyle {A}}$  contains the unknown coefficients that we'd like to solve, which is ${\displaystyle {A}^{T}={\begin{bmatrix}a_{1}&a_{2}&...&a_{m}\end{bmatrix}}}$ ; Vector {R} contains the residuals, which is ${\displaystyle {R}^{T}={\begin{bmatrix}r_{1}&r_{2}&...&r_{n}\end{bmatrix}}}$ .

To minimize ${\displaystyle {R}^{T}}$ , we follow the same method in lesson 2, obtaining partial derivative for each coefficient, and setting it equal zero. As a result, we have a system of normalized equations, and they can be represented in the following matrix form: ${\displaystyle [[X]^{T}[X]]{A}={[X]^{T}[Y]}}$ .

To solve the system, we have many options, such as LU method, Cholesky method, inverse matrix, and Gauss-Seidel. (Generally, the equations might not result in diagonal dominated matrices, so Gauss-Seidel method is not recommended.)

182.56.115.73 (discuss) 11:40, 23 August 2014 (UTC)KB

#### Lesson 4: Least-Squares Method in statistical view

From equation ${\displaystyle [[X]^{T}[X]]{A}={[X]^{T}[Y]}}$ , we can derive the following equation: ${\displaystyle {A}=[[X]^{T}[X]]^{-1}{[X]^{T}{Y}}}$ .

From this equation, we can determine not only the coefficients, but also the approximated values in statistic.

Using calculus, the following formulas for coefficients can be obtained:

${\displaystyle a=}$ ${\displaystyle S_{xy} \over {S_{x}^{2}}}$  and ${\displaystyle b={\bar {y}}-a*{\bar {x}}}$

where

${\displaystyle S_{xy}=}$ ${\displaystyle \sum _{i=1}^{i=n}((x_{i}-{\bar {x}})*(y_{i}-{\bar {y}})) \over {n-1}}$

${\displaystyle S_{x}^{2}=}$ ${\displaystyle \sum _{i=1}^{i=n}((x_{i}-{\bar {x}})^{2}) \over {n-1}}$

${\displaystyle {\bar {x}}=}$ ${\displaystyle \sum _{i=1}^{i=n}(x_{i}) \over {n}}$

${\displaystyle {\bar {y}}=}$ ${\displaystyle \sum _{i=1}^{i=n}(y_{i}) \over {n}}$ .

Moreover, the diagonal values and non-diagonal values matrix ${\displaystyle [[X]^{T}[X]]^{-1}}$  represents variances and covariances of coefficient ${\displaystyle a_{i}}$ , respectively.

Assume the diagonal values of ${\displaystyle [[X]^{T}[X]]^{-1}}$  is ${\displaystyle x_{i,i}}$  and the corresponding coefficient is ${\displaystyle a_{i}}$ , then

${\displaystyle var(a_{i-1})=x_{i,i}*s_{y/x}^{2}}$  and ${\displaystyle cov(a_{i-1},a_{j-1})=x_{i,j}*s_{y/x}^{2}}$

where ${\displaystyle s_{y/x}}$  is called stand error of the estimate, and ${\displaystyle s_{y/x}={\sqrt {S \over {n-2}}}}$ .

(Here, lower index, y/x, means that the error of certain x is caused by the inaccurate approximation of corresponding y.)

We have many application on these two information. For example, we can derive the upper and lower bound of intercept and slope.

### Assignments

To better understand the application of Least-Squares application, the first question will be solved by applying the LLS equations, and the second one will be solved by Matlab program.

#### Question1: Linear Least-Square Example

The following are 8 data points that shows the relationship between the number of fishermen and the amount of fish (in thousand pounds) they can catch a day.

Number of Fishermen Fish Caught
18 39
14 9
9 9
10 7
5 8
22 35
14 36
12 22

According to this data set, what is the function between the number of fishermen and the amount of fish caught? hint: let the number of fisherman be x, and the amount of fish caught be y, and use LLS to find the coefficients.

##### Calculation

By the simple calculation and statistic knowledge, we can easily find out:

1. ${\displaystyle {\bar {X}}}$  = 13
2. ${\displaystyle {\bar {Y}}}$  = 20.625, and
3. the following chart
X Y ${\displaystyle X-{\bar {X}}}$  ${\displaystyle Y-{\bar {Y}}}$  ${\displaystyle (X-{\bar {X}})*(Y-{\bar {Y}})}$  ${\displaystyle (X-{\bar {X}})^{2}}$
18 39 5 18.375 91.875 25
14 9 1 ${\displaystyle -11.625}$  ${\displaystyle -11.625}$  1
9 9 ${\displaystyle -4}$  ${\displaystyle -11.625}$  46.5 16
10 7 ${\displaystyle -3}$  ${\displaystyle -13.625}$  40.875 9
5 8 ${\displaystyle -8.0}$  ${\displaystyle -12.625}$  101 64
22 35 9 14.375 129.375 81
14 36 1 15.375 15.375 1
12 22 ${\displaystyle -1}$  1.375 ${\displaystyle -1.375}$  1

Thus, we have ${\displaystyle \Sigma {(X-{\bar {X}})*(Y-{\bar {Y}})}=412}$ , and ${\displaystyle \Sigma {(X-{\bar {X}})^{2}}=198}$ , so the slope, a, = ${\displaystyle 412 \over {198}}$ ${\displaystyle =2.{\bar {08}}}$ .

And last the intercept, b, = ${\displaystyle 20.625-2.{\bar {08}}*13=-6.4255}$ .

Therefore, the linear least-squares line is ${\displaystyle Y=f(X)=-6.4255+2.{\bar {08}}*X}$ .

#### Question2: Nonpolynomial example

We have the following data ${\displaystyle (x_{i},y_{i})}$ , where ${\displaystyle 1\leq i\leq n}$ , by a function of the form ${\displaystyle a*e^{x}+b*lnx+c*sinx+d*cosx}$ .

 x y 0.23 0.66 0.93 1.25 1.75 2.03 2.24 2.57 2.87 2.98 0.25 ${\displaystyle -0.27}$ ${\displaystyle -1.12}$ ${\displaystyle -0.45}$ 0.28 0.13 ${\displaystyle -0.27}$ 0.26 0.58 1.03

Write a Matlab program that uses Least-Squares method to obtain the estimated function. hint: input the data in the matrix form, and solve the system to obtain the coefficients.

The blue spots are the data, the green spots are the estimated nonpolynomial function.

## References

[1] Cheney, Ward and Kincaid, David. Numerical Mathematics and Computing Fifth Edition. Belmont: Thomson Learning, 2004

[2] Chapra, Steven and Canale, Raymond. Numerical Method for Engineers: With Software and Programming Applications Fourth Edition. McGraw-Hill, 2005

[3] Keller, Gerald. Statistics for Management and Economics Seventh Edition. Thomson Higher Education, 2005