Bubble sort

The time necessary for a sort with BubbleSort increases in a quadratic fashion with the number of records. The algorithm works in place and stable. In the best case, when the data records are already sorted, the time increases in a linear fashion with the number of records.

Short Description

The main idea behind BubbleSort is to find the biggest key value and to move that record to the end of the list. This moving is done by comparing the key value of the actual record with the key value of the next record. If the key value of the next record is smaller, the two records are swapped; if the value is bigger, the candidate for having the biggest key value is from then on the next record.

The algorithm works with an outer and an inner loop. After the first repetition of the outer loop the record with the biggest key value is definitely on top of the list. When searching for the second biggest key value, it is not necessary to compare it with the last element, since we already know the result of such a compare; the number of repetitions of the inner loop is decreased by one with every repetition of the outer loop.

Since every record with the actually biggest key value is moved up, it is normally not just one record that moves upwards during a repetition of the outer loop. This is compared with the bubbles in a drink and that's where the algorithm got its name from.

Stability

If there are records with identical key values and they become neighbors during the sort, the upper record becomes the new candidate for having the biggest key value. If the algorithm is realized with a bigger- and not a bigger-or-same- clause, it is stable, because records with identical key value do not change their position relative to each other.

Timing

For an estimate of the runtime we visualize the number of compare operations. In the vertical direction we have the repetitions of the outer loop and in the horizontal direction the number of compare operations.

   1  2  3  4  5  6  7  8  9 10   |   compares
   x  x  x  x  x  x  x  x  x  -   |      9
   x  x  x  x  x  x  x  x  -  -   |      8
   x  x  x  x  x  x  x  -  -  -   |      7
   x  x  x  x  x  x  -  -  -  -   |      6
   x  x  x  x  x  -  -  -  -  -   |      5
   x  x  x  x  -  -  -  -  -  -   |      4
   x  x  x  -  -  -  -  -  -  -   |      3
   x  x  -  -  -  -  -  -  -  -   |      2
   x  -  -  -  -  -  -  -  -  -   |      1
   -  -  -  -  -  -  -  -  -  -   |      0

Half of the n*n matrix is filled except for the diagonal; that makes ½ n² - n compares. Since we don't need the exact number but the rough behavior, we calculate the number of compare operations with

N_C = ½ n²

Worst Case & Normal Case

In order to get the worst case concerning the swap operations, we assume that the list is sorted in reverse order. Then every compare operation is followed by a swap.

N_S = ½ n²

Every compare- and swap-operation needs time. We evaluate the time for a complete sort with

T_Sort = ½ * C * n² + ½ * S * n²

T_Sort = n² * (½ C + ½ * S)

T_Sort = B * n²

where B, C and S are constants and n the number of records to be sorted.

With an even distribution of the key values the calculation of the necessary number of steps becomes a bit complicated and the correct evaluation can be found here ^[1]. It increases with ¼ n². The time for a sort will be in average twice as fast as for the worst case. But the timing still follows the quadratic characteristic, only the actual value of B changes.

Termination: The loop terminates at j=i+1. At that point A[i]  A[i+1], A[i]  A[i+2], …, A[i]  A[n]. So, A[i] is the smallest element between A[i] through A[n]. QED

Theorem: BubbleSort correctly sorts the input array A.

Loop invariant for lines 1 through 4 is that A[i]  A[i+1].

Initialization: Starts with A[1].

Maintenance: After the iteration i=p, for some 1<pn, the above lemma for lines 2 through 4 guarantees that A[p] is the smallest element from A[p] through A[n]. So, A[p]  A[p+1].

Termination: The loop 1 through 4 terminates at i=n. At that point A[1]  A[2]  . . .  A[n]. Hence, the correctness theorem is proved. QED

Complexity:  i=1n  j=i+1n 1 =  i=1n (n –i) = n2 – n(n+1)/2 = n2/2 – n/2 = (n2)

Best Case

Now we come to the best case: the records are already sorted. Without any additions to the algorithm we still have the same number of compare operations, but no swaps. This information about "no swaps" can be used to speed up the sort when the records are sorted or nearly sorted. At the beginning of the outer loop we set a marker to NO_SWAPS_DONE. In the inner loop this value is set to SWAPS_DONE in case any records were swapped. At the end of the outer loop we check the value of this marker. If there were no swaps, the records are sorted and the algorithm can terminate. We have a best case behavior of

T_Best = n * C

Nearly Sorted

Now we have a look at a nearly sorted situation by adding a new record at the end of the list; and the key value is the smallest of all records. The number of compare operations will still be

N_C = ½ n²

and the number of swaps will be

N_S = n

Compared with algorithms like InsertionSort the timewise behavior is pretty bad.

Example

In the list to be sorted are 10 records. A blue background indicates that there has to be a swap of data records. A green background indicates, that the upper record becomes the new candidate for the biggest key value.

First repetition of outer loop

387	310	529	174	011	227	020	901	522	771
310	387	529	174	011	227	020	901	522	771
310	387	529	174	011	227	020	901	522	771
310	387	174	529	011	227	020	901	522	771
310	387	174	011	529	227	020	901	522	771
310	387	174	011	227	529	020	901	522	771
310	387	174	011	227	020	529	901	522	771
310	387	174	011	227	020	529	901	522	771
310	387	174	011	227	020	529	522	901	771
310	387	174	011	227	020	529	522	771	901

Second repetition of outer loop

310	387	174	011	227	020	529	522	771	901
310	387	174	011	227	020	529	522	771	901
310	174	387	011	227	020	529	522	771	901
310	174	011	387	227	020	529	522	771	901
310	174	011	227	387	020	529	522	771	901
310	174	011	227	020	387	529	522	771	901
310	174	011	227	020	387	529	522	771	901
310	174	011	227	020	387	522	529	771	901
310	174	011	227	020	387	522	529	771	901

Third repetition of outer loop

310	174	011	227	020	387	522	529	771	901
174	310	011	227	020	387	522	529	771	901
174	011	310	227	020	387	522	529	771	901
174	011	227	310	020	387	522	529	771	901
174	011	227	020	310	387	522	529	771	901
174	011	227	020	310	387	522	529	771	901
174	011	227	020	310	387	522	529	771	901
174	011	227	020	310	387	522	529	771	901

Fourth repetition of outer loop

174	011	227	020	310	387	522	529	771	901
011	174	227	020	310	387	522	529	771	901
011	174	227	020	310	387	522	529	771	901
011	174	020	227	310	387	522	529	771	901
011	174	020	227	310	387	522	529	771	901
011	174	020	227	310	387	522	529	771	901
011	174	020	227	310	387	522	529	771	901

Fifth repetition of outer loop

011	174	020	227	310	387	522	529	771	901
011	174	020	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901

Sixth repetition of outer loop

011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901
011	020	174	227	310	387	522	529	771	901

The algorithm terminates after this loop due to the fact that there were no swaps done.

↑ Knuth, Donald E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd Edition, Addison-Wesley, 1981

[1] Knuth, Donald E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd Edition, Addison-Wesley, 1981

[1]