# Floating point/Lesson Three

### Absolute and Relative Error edit

Before delving further into a discussion of storing a number, how can we measure how wrong a number is can be an important issue. If the correct answer is 100, is 100.9 might not be a bad approximation. What if the answer is 1, and we store the number as 1.9?

Denote the correct answer as *x*, and the stored value as *x*_{0}. Then, the **absolute error** is simply | *x* - *x*_{0} |. Thus, in the previous example, our absolute error was 0.9 (in both cases).

On the other hand, it may be important to see what the error is in relation to how large the number is. An error of 0.9 is usually less important when the number is 100 instead of 1. The **relative error** is defined as | *x* - *x*_{0} |/| *x* |.

In our two examples, the relative errors are | 100.9 - 100 |/| 100 | = 0.009 and | 1 - 1.9 | / | 1 | = 0.9.

### Binary representation in a Computer edit

In a computer, numbers are stored in **normalized floating-point representation**. The numbers are converted to binary, and are represented as ±*x* × 2^{n}, where *x* is a number with only decimal digits and a 1 as the first digit before the decimal, and *n* is an exponent of 2.

The number, stored as the portion after the 1 (1.f), is called the **normalized mantissa**. The exponent is simply called the **exponent**.

For instance, 55.5 would be 55.5 base 10 = 110111.1 base 2 = 1.101111 base 2 × 2^{5} = 1.101111 × (2)^{101}.

### Limited Storage edit

The computer has a limited amount of storage space for a number. Let's say that this amount is three digits (0.xyz), with a 1 digit exponent, and the exponent can be either positive or negative. Thus, we have the following numbers (table from Cheney & Kincaid):

0.000 × 2° = 0 0.000 × 2¹ = 0 0.000 × 2-¹ = 0 0.001 × 2° = 1/8 0.001 × 2¹ = 1/4 0.001 × 2-¹ = 1/16 0.010 × 2° = 1/4 0.010 × 2¹ = 1/2 0.010 × 2-¹ = 1/8 0.011 × 2° = 3/8 0.011 × 2¹ = 3/4 0.011 × 2-¹ = 3/16 0.100 × 2° = 1/2 0.100 × 2¹ = 1 0.100 × 2-¹ = 1/4 0.101 × 2° = 5/8 0.101 × 2¹ = 5/4 0.101 × 2-¹ = 5/16 0.110 × 2° = 3/4 0.110 × 2¹ = 3/2 0.110 × 2-¹ = 3/8 0.111 × 2° = 7/8 0.111 × 2¹ = 7/4 0.111 × 2-¹ = 7/16

We have the following numbers:

0 1/16 1/8 3/16 1/4 5/16 3/8 7/16 1/2 5/8 3/4 7/8 1 5/4, 7/4 X X X X X X X X X X X X X ... X...X(off screen)

As you can tell, the numbers are not equally spaced.

You may ask, why are we including 0.000, 0.001, 0.010, and 0.011? That is a good question, as computers *assume* that the first number is a '1'. Computers do this so that they can save space. This means that, in actuality, the numbers in our machine are the following (we also note that computers normally store as 1.f, but that's not the exercise now):

0.100 × 2° = 1/2 0.100 × 2¹ = 1 0.100 × 2-¹ = 1/4 0.101 × 2° = 5/8 0.101 × 2¹ = 5/4 0.101 × 2-¹ = 5/16 0.110 × 2° = 3/4 0.110 × 2¹ = 3/2 0.110 × 2-¹ = 3/8 0.111 × 2° = 7/8 0.111 × 2¹ = 7/4 0.111 × 2-¹ = 7/16

Now, the number line looks like this:

0 1/4 5/16 3/8 7/16 1/2 5/8 3/4 7/8 1 5/4, 7/4 X X X X X X X X X X ... X...X (off screen)

This phenomenon, where computers miss really small numbers, is known in computer science as the **hole at zero**. When a number is in the hole and rounded down to zero, it is considered part of **underflow**.

### Rounding edit

If the computer can only process a finite amount of numbers, it has to choose to round answers to another number. It can choose to round up always, round down always, or round to the nearest computer number. We will assume that the computer rounds to the nearest number. This is done in practice as well.

Rounding down to the nearest integer always is known as **chopping**.

### Computer Epsilon edit

A special number in computers is **computer epsilon**. This number is defined as the smallest number such that:

1 + ε ≠ 1.

In our special computing system developed in the previous section, the space between 1 and 5/4 is 1/4. So the computer epsilon is 1/8, because anything above 1 + 1/8 rounds to 5/4, and anything below 1 + 1/8 is rounded to 1.

In lesson four, we will outline a proof that states that the roundoff error when digits are rounded up always or rounded down always is ε, but when the rounding is done to the *nearest* machine number, the error is ε/2.

### Overflow edit

Another important number is **overflow**. When the computer cannot process a number because it is too large, it runs to overflow. An overflow error is extremely important, because it stops most computer problems. If a number is underflowed, it is considered not as serious, and the computer will simply round to zero. This makes sense because underflow is bounded by 1/2 the distance between 0 and the next computer number; whereas in overflow, there is no bound on how large the relative error is.

### Matlab Example edit

Matlab has the code for the computer epsilon, the **real max**, or the last machine number before overflow, and the **real min**, the first machine number after zero:

>> eps ans = 2.2204e-016 >> realmax ans = 1.7977e+308 >> realmin ans = 2.2251e-308

### Homework edit

1. Assume a system of 2 binary digits with an exponent of 0 or ±1. Draw a number line indicating the available numbers to the computer, if a zero is allowed as the first number. Then, remove the appropriate numbers. How large is the hole at zero? What is computer epsilon?

2. In the three digit number system we developed in this lesson (with the hole at zero), perform the following operations:

a) 1/12 + 1/3 (assume that 1/12 is stored in the system BEFORE it is operated on)

b) 1/2 + 1/2

c) 1/8 + 1/3 (assume that the number is added BEFORE it is stored)

d) 1 + 9

### Source edit

Cheney Ward and David Kincaid. *Numerical Methods and Computing*. Belmont, CA: Thomson, 2004.