# Data structures/Transwiki Data Structures

## Introduction

Computer scientists have created many ways to store data. For example, they have designed several ways to store a list. A computer might be able to look at the data in one kind of list very quickly. Another type of list might trade off retrieval speed for quick searches for a specific piece of data. These data structures, with their various tradeoffs, are the subject of this course.

It is important not to confuse data structures with object classes. Data structures are designed by computer scientists to tell the computer exactly how to physically organize and work with the data. Classes are combinations of data structures with algorithms working with these structures, designed by programmers to solve actual problems.

Pseudocode that is close to the Pascal programming language appears here. There are some examples of the code in other languages also.

## Arrays

An array, in general, is a constant-sized sequential set of blocks, each block containing a value of the elected datatype. The use of arrays is integral in most high-level languages today. Versions of dynamic arrays exist, but this approach is usually costly, because arrays expanded into sizes larger than is available at the specific location where the array is stored, would require the copying of the entire array into an area large enough to hold the expanded array. For example, consider an array of 5 1-byte-long blocks, which sits at address 1000, and assume there is a variable X, 1 byte long, sitting at address 1007. Should you wish to extend your array to hold 10 blocks, you would need to copy the first 5 blocks to another location in order to accommodate the rest of the array. Another disadvantage of the sequential storage method is that there has to be a free sequential block large enough to hold the array. If you have an array of 500,000,000 blocks, each 1 byte long, you need to have roughly 500 megabytes of sequential space to be free; Sometimes this will require a defragmentation of the memory, which takes a long time.

• Constant size
• Constant data-type
• Large free sequential block to accommodate large arrays

One way to make a list of data in a certain order is to treat it like a chain. Each link in the chain tells you where its piece of data is and where the next link in the chain is.

Let's make a linked list version of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] :

```class Node
var int data
var Node* next
class List
function insert(data)
function remove(data)
var Node* previous := None
var boolean found := false
while (current != None) and (found == False)
if data == current.data
found := True
else
previous := current
current := current.next
if found == True
previous := current.next
```

This is called a singly-linked list, because each node has only a single link to the next node. You can only go along the list in one direction.

Special features:

• You are required to keep a reference to the head node.
• It is simple to think about and implement.
• It is hard to make errors.
• It is easy to figure out when you've gone through all the elements.

Efficiency features:

• It needs little memory (two references per item).
• It will quickly get to the items near the head of the list and the next item in the list.
• It will slowly get to the items near the tail of the list and the previous item in the list.
• It is slow at searching for a specific item.

A common optimization is to also keep a reference to the tail node. This allows new nodes to be quickly inserted at the end of the list.

Let's do the same thing with a small change:

```class ListNode
attr_accessor :prev_node, :item, :next_node

# Add the next item in the list
def insert(new_item)
@next_node = ListNode.new
@next_node.prev_node = self
@next_node.item = new_item
end

def inspect
"#{@item.inspect}, #{@next_node.inspect}"
end
end

node = ListNode.new
node.item = 0
node = node.next_node
end

# Move from the tail of the list to the head of the list.
node = node.prev_node until node.prev_node.nil?

puts "The list: #{node.inspect}."
puts "The third item: #{node.next_node.next_node.item}."
```

This is called a doubly-linked list, because each node has links to both the previous and next nodes. You can go along the list in either direction.

Special features:

• You are no longer required to keep a reference to the head node.
• It is simple to think about and implement
• Errors can easily be made when the next node doesn't believe that this is its previous node or the previous node disagrees that this is its next node.
• It is easy to figure out when you've gone through all the elements.

Efficiency features:

• It needs little memory (three references per item).
• It will quickly get to the previous or next item in the list.
• It will slowly get to items near the head or tail of the list.
• It is slow at searching for a specific item.

Some common optimizations are to keep references to the head and tail nodes. This allows quick moves to either end of the list and quick insertions of new nodes at the end of the list.

## Stacks

Stacks, as well as queues, are variations on the linked list. Stacks are also known as Last In First Out (LIFO) data structures, and this is indicative of their behavior. Simply put, when you put an item into a stack, generally called a 'push' operation, you are putting that element at the 'top' of the stack. When you remove an item from the stack, commonly known as a 'pop' operation, you're removing an element from the 'top' of the stack.

One common metaphor for a stack data structure is a vertical stack of dishes. The only feasible way to add a dish to the pile is to place it on top, and likewise, the only reasonable way to remove a dish is to remove it from the top.

#### An Example

``` Stack of numbers: (bottom) 2, 4, 9, 1, 3, 8 (top)
```

When you remove the first item, you'll get an 8. Making the stack as follows:

``` 2, 4, 9, 1, 3
```

Then when you "pop" the stack again, you'll get a 3, leaving the stack in this state:

``` 2, 4, 9, 1
```

If you were then to add a 7 to the stack, 'pushing' it on, the stack would become:

``` 2, 4, 9, 1, 7
```

and so on.

#### Real world use

Stacks are generally implemented such that pushing and popping operations can be performed very quickly, in constant time.

## Trees

### Binary Trees

Are quickly searched, but may become unbalanced as nodes are added and deleted.

### B+ Trees

Are similar to Binary trees, but overcome the short-comings with the addition of self-balancing insert and delete functions.

### Oct-trees

A number of other Tree data structures exist, in a variety of flavors.

## Heaps

A heap is a data structure which implements the priority queue abstract data type. The most important operation a priority queue defines is the retrieval of an element with minimum (or maximum) value. This versatility of the heap stems from the fact that which of the maximum or minimum is at the root depends on the comparison function we provide the heap. The heap invariant is as follows: all children must have a greater (or lesser) value than their parents. This implies that the smallest (or largest) value must be located at the root. The heap facilitates this operation by moving the node of least (or most) value to the root of the heap after each insertion operation. This transformation is known as re-heapification.

Since a heap is equivalent to a balanced full binary tree, we can economize and represent it as a flat array. This has the advantage of allowing us to compute the child or parent of an arbitrary node given its index with a simple arithmetic expression, rendering explicit parent or child pointers unnecessary.

The re-heapify operation is bounded by the height of the tree, and movement through the tree is only in one direction. The worst case is when swaps have to be performed all the way from leaf to root, hence it is O(log n).

A heap can be used to implement heapsort, an optimal sorting algorithm, simply by inserting keys into the heap, then removing them one by one. Due to the heap invariant, as they are extracted they will be in sorted order.

``` class Heap

def to_s
@heap.to_s
end

def initialize
@heap = Array.new
@size = 0
end

def <<(item)
@heap << item
@size += 1
reheapify_up
end

def pop
ret = @heap[0]
@heap[0] = @heap.pop
reheapify_down()

@size -= 1
return ret
end

def size
@size
end

def empty?
@size == 0
end

private
def reheapify_up
ptr = @heap.length - 1
while ptr > 0
if (@heap[parent_of(ptr)] <=> @heap[ptr]) > 0
swap(parent_of(ptr), ptr)
end
ptr = parent_of(ptr)
end
end

def reheapify_down
ptr = 0

while (2*ptr+1) < @heap.size
kid = 2*ptr + 1

if kid < @heap.size-1 and @heap[kid] > @heap[kid+1]
kid += 1
end

if @heap[ptr] < @heap[kid] then
break
else
swap(kid, ptr)
ptr = kid
end
end
end

def parent_of(ind)
(ind-1) / 2
end

def swap(a, b)
@heap[a], @heap[b] = @heap[b], @heap[a]
end
end
```

## Hash tables

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number). It works by transforming the key using a hash function into a hash, a number that the hash table uses to locate the desired value.