Applied Programming/Dictionaries and Sets

This lesson introduces dictionaries and sets.

Objectives and Skills edit

Objectives and skills for this lesson include:

Readings edit

  1. Wikipedia: Data structure
  2. Wikipedia: Dictionary (data structure)
  3. Wikipedia: Hash table
  4. Wikipedia: Key-value database
  5. Wikipedia: Set (abstract data type)
  6. Python for Everyone: Dictionaries

Multimedia edit

  1. YouTube: Python Dictionary
  2. YouTube: Sort by Lambda
  3. YouTube: Python Tutorial - 24. Sets and Frozen Sets
  4. YouTube: Python: Combining Dictionaries with Lists
  5. YouTube: Python Programming Tutorial- 15 Using Dictionaries with For Loops
  6. YouTube: How to Sort a List of Dictionaries
  7. YouTube: Reading CSV File Into a Python Dictionary

Examples edit

Activities edit

  1. Download GitHub: Northwind Customers. Write a program to read the file and create a customers list, where each element in the list is a dictionary of information for a given customer.
  2. Provide an interface for the program above that allows the user to:
    • Display company name, contact name, and phone number for all customers sorted by company name.
    • Display contact name, company name, and phone number for all customers sorted by contact name.
    • Search for a given company name or part of a name and display matching records with fields labeled.
    • Search for a given contact name or part of a name and display matching records with fields labeled.
  3. For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.

Lesson Summary edit

  • A data structure is a particular way of organizing and storing data in a computer so that it can be accessed and modified efficiently.[1]
  • An associative array, map, or dictionary is an abstract data type that collects key-value pairs, where each key only appears a single time in the container.[2]
    • This is often implemented using a hash table—an array with a hash function that maps keys to indices, avoiding collisions when possible.[2]
    • These data structures are unordered collections of unique key-value pairs that do not maintain the elements' position or order, unlike lists or tuples. [3]
    • Dictionaries vary from lists since lists are a sequence of objects accessed in a certain order. Dictionaries can hold any object and use certain keys to access them. In a list, looping and comparing have to be used to check for a certain variable while dictionaries just need a specific integer or string.
  • The operations that are usually defined for an associative array are:[2]
    • Add or insert: add a new key, value pair to the collection, binding the new key to its new value. The arguments to this operation are the key and the value.[2]
    • Reassign: replace the value in one of the key, value pairs that are already in the collection, binding an old key to a new value. As with an insertion, the arguments to this operation are the key and the value.[2]
    • Remove or delete: remove a key, value pair from the collection, unbinding a given key from its value. The argument to this operation is the key.[2]
    • Lookup: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception.[2]
  • In addition, associative arrays may also include other operations such as determining the number of bindings or constructing an iterator to loop over all the bindings.[2]
  • Dictionaries are typically implemented in programming languages as either a hash table or a search tree. [4]
    • Python dictionaries are collections that are indexed by keys, which can be any immutable type.
    • Dictionaries may be nested and are accessed using the syntax dictionary[key][subkey]
    • Dictionary items may be accessed with a for loop.
    • Dictionaries are unordered. Dictionary items may be displayed in key order by sorting a list of keys using the sorted() function and the dictionary.keys()method.
  • Serialization, which produces a text or binary representation of the original objects that can be written directly to a file, offers a solution to use associative arrays in permanent form.[2]
  • Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.[5]
  • When storing a value into a hash table, its key is manipulated to produce a valid array index (non-negative integer). As long as there is no element occupying the location, the value is then placed into this 'bucket'.[5]
    • When retrieving the value using the associated key, this process is reversed. This, of course, becomes tricky when collisions have to be resolved.[5]
  • The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found.[5]
  • The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large.[5]
  • Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.[5]
  • Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. Therefore, almost all hash table implementations have some collision resolution strategy to handle such events, such as:[5]
    • Separate chaining[5]
    • Open addressing[5]
    • Robin Hood hashing[5]
    • 2-choice hashing[5]
  • A critical statistic for a hash table is the load factor, defined as: Load Factor = n/k[5]
    • n represents the number of entries occupied in the hash table and whereas k is the number of buckets. As the load factor grows larger, the hash table becomes slower.[5]
  • If one considers every structure yielded by packaging and/or indexing, there are four basic data structures:[6]
    • unpackaged, unindexed: bunch[6]
    • packaged, unindexed: set[6]
    • unpackaged, indexed: string (sequence)[6]
    • packaged, indexed: list (array)[6]
  • A set is a data structure that stores values without any particular order and without repeated values.[6]
  • Unlike other containers, rather than retrieving a specific element, one typically tests a value for membership in the set.[6]
  • Depending on the language, sets may be frozen, meaning they do not change after they are constructed.[6]
    • Static sets allow only query operations on their elements — such as checking whether a given value is in the set or enumerating the values in some arbitrary order.[6]
  • Dynamic or mutable sets allow the insertion and deletion of elements from the set.[6]
  • Key-value databases can use consistency models ranging from eventual consistency to serializability. Some support ordering of keys. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks.[7]
  • Because optional values are not represented by placeholders as in most Relational Databases, key-value databases often use far less memory to store the same database.[7]
  • Key-value systems treat the data as a single opaque collection, which may have different fields for every record.This offers considerable flexibility and more closely follows modern concepts like object-oriented programming.[7]
  • There are multiple ways to create a dictionary in Python.
    • The dict() constructor builds dictionaries directly from sequences of key-value pairs.[8]
      • Exampleː dimensions = dict([('width', 12), ('length', 10), ('depth', 3)]).
    • Dictionary comprehensions can be used to create dictionaries from arbitrary key and value expressions.[8]
      • Exampleː dimensions = {width: 12, lengthː 10, depthː 3}.
    • When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments.[8]
      • Exampleː dimensions = dict(width=12, length=10, depth=3).

Key Terms edit

associative array, dictionary, hash, map
Alternative names for a data structure that contains key-value pairs, where each key only appears a single time in the container.[2]
coalesced hashing
A hybrid of chaining and open addressing, coalesced hashing links together chains of nodes within the table itself.[5]
frozen set
Also known as static sets, these sets do not change after they are constructed. Static sets allow only query operations on their elements.[6]
hash collisions
Occur when two entries are generated using the same index key. This has a high chance of occurring when hashing a random subset of a large set of possible keys.[5]
hash table (hash map)
is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.[5]
key
A datum used to access the value placed in the corresponding dictionary bucket. If the ID number of an employee were a unique field, entering this as the 'key' would pull up the rest of the employee's information.[5]
multiset
Similar to a plain set but one that allows for repeated values and duplicates.[6]
merge algorithms
are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.[9]
multimap
generalizes an associative array by allowing multiple values to be associated with a single key.[2]
serialization
Serialization produces a text or binary representation of the original objects that can be written directly to a file and offers a solution to use associative arrays in permanent form.[2]
set
A data structure that stores values without any particular order and without repeated values.[6]
value
The data is retrieved using the key. It doesn't have to be a single object; it could be another container or structure itself.[5]

See Also edit

References edit