Floating point/Lesson Three

Absolute and Relative Error

Before delving further into a discussion of storing a number, how can we measure how wrong a number is can be an important issue. If the correct answer is 100, is 100.9 might not be a bad approximation. What if the answer is 1, and we store the number as 1.9?

Denote the correct answer as x, and the stored value as x₀. Then, the absolute error is simply | x - x₀ |. Thus, in the previous example, our absolute error was 0.9 (in both cases).

On the other hand, it may be important to see what the error is in relation to how large the number is. An error of 0.9 is usually less important when the number is 100 instead of 1. The relative error is defined as | x - x₀ |/| x |.

In our two examples, the relative errors are | 100.9 - 100 |/| 100 | = 0.009 and | 1 - 1.9 | / | 1 | = 0.9.

Binary representation in a Computer

In a computer, numbers are stored in normalized floating-point representation. The numbers are converted to binary, and are represented as ±x × 2ⁿ, where x is a number with only decimal digits and a 1 as the first digit before the decimal, and n is an exponent of 2.

The number, stored as the portion after the 1 (1.f), is called the normalized mantissa. The exponent is simply called the exponent.

For instance, 55.5 would be 55.5 base 10 = 110111.1 base 2 = 1.101111 base 2 × 2⁵ = 1.101111 × (2)¹⁰¹.

Limited Storage

The computer has a limited amount of storage space for a number. Let's say that this amount is three digits (0.xyz), with a 1 digit exponent, and the exponent can be either positive or negative. Thus, we have the following numbers (table from Cheney & Kincaid):

0.000 × 2° = 0     0.000 × 2¹ = 0     0.000 × 2-¹ = 0
0.001 × 2° = 1/8   0.001 × 2¹ = 1/4   0.001 × 2-¹ = 1/16
0.010 × 2° = 1/4   0.010 × 2¹ = 1/2   0.010 × 2-¹ = 1/8
0.011 × 2° = 3/8   0.011 × 2¹ = 3/4   0.011 × 2-¹ = 3/16
0.100 × 2° = 1/2   0.100 × 2¹ = 1     0.100 × 2-¹ = 1/4
0.101 × 2° = 5/8   0.101 × 2¹ = 5/4   0.101 × 2-¹ = 5/16
0.110 × 2° = 3/4   0.110 × 2¹ = 3/2   0.110 × 2-¹ = 3/8
0.111 × 2° = 7/8   0.111 × 2¹ = 7/4   0.111 × 2-¹ = 7/16

We have the following numbers:

0  1/16  1/8 3/16  1/4 5/16  3/8 7/16  1/2      5/8      3/4      7/8       1      5/4, 7/4
X    X    X    X    X    X    X    X    X        X        X        X        X  ... X...X(off screen)

As you can tell, the numbers are not equally spaced.

You may ask, why are we including 0.000, 0.001, 0.010, and 0.011? That is a good question, as computers assume that the first number is a '1'. Computers do this so that they can save space. This means that, in actuality, the numbers in our machine are the following (we also note that computers normally store as 1.f, but that's not the exercise now):

0.100 × 2° = 1/2   0.100 × 2¹ = 1     0.100 × 2-¹ = 1/4
0.101 × 2° = 5/8   0.101 × 2¹ = 5/4   0.101 × 2-¹ = 5/16
0.110 × 2° = 3/4   0.110 × 2¹ = 3/2   0.110 × 2-¹ = 3/8
0.111 × 2° = 7/8   0.111 × 2¹ = 7/4   0.111 × 2-¹ = 7/16

Now, the number line looks like this:

0                  1/4 5/16  3/8 7/16  1/2      5/8      3/4      7/8       1      5/4, 7/4
X                   X    X    X    X    X        X        X        X        X  ... X...X (off screen)

This phenomenon, where computers miss really small numbers, is known in computer science as the hole at zero. When a number is in the hole and rounded down to zero, it is considered part of underflow.

Rounding

If the computer can only process a finite amount of numbers, it has to choose to round answers to another number. It can choose to round up always, round down always, or round to the nearest computer number. We will assume that the computer rounds to the nearest number. This is done in practice as well.

Rounding down to the nearest integer always is known as chopping.

Computer Epsilon

A special number in computers is computer epsilon. This number is defined as the smallest number such that:

1 + ε ≠ 1.

In our special computing system developed in the previous section, the space between 1 and 5/4 is 1/4. So the computer epsilon is 1/8, because anything above 1 + 1/8 rounds to 5/4, and anything below 1 + 1/8 is rounded to 1.

In lesson four, we will outline a proof that states that the roundoff error when digits are rounded up always or rounded down always is ε, but when the rounding is done to the nearest machine number, the error is ε/2.

Overflow

Another important number is overflow. When the computer cannot process a number because it is too large, it runs to overflow. An overflow error is extremely important, because it stops most computer problems. If a number is underflowed, it is considered not as serious, and the computer will simply round to zero. This makes sense because underflow is bounded by 1/2 the distance between 0 and the next computer number; whereas in overflow, there is no bound on how large the relative error is.

Matlab Example

Matlab has the code for the computer epsilon, the real max, or the last machine number before overflow, and the real min, the first machine number after zero:


>> eps

ans =

  2.2204e-016

>> realmax

ans =

  1.7977e+308

>> realmin

ans =

  2.2251e-308

Homework

1. Assume a system of 2 binary digits with an exponent of 0 or ±1. Draw a number line indicating the available numbers to the computer, if a zero is allowed as the first number. Then, remove the appropriate numbers. How large is the hole at zero? What is computer epsilon?

2. In the three digit number system we developed in this lesson (with the hole at zero), perform the following operations:

a) 1/12 + 1/3 (assume that 1/12 is stored in the system BEFORE it is operated on)

b) 1/2 + 1/2

c) 1/8 + 1/3 (assume that the number is added BEFORE it is stored)

d) 1 + 9

Source

Cheney Ward and David Kincaid. Numerical Methods and Computing. Belmont, CA: Thomson, 2004.