Python Concepts/List Comprehension

Objective

edit
 
  • To understand and use List Comprehensions, including Nested List Comprehensions.

Lesson

edit

To create a list of squares of numbers in range(1,6) we can do something like:

>>> a = []
>>> for x in range(1,6) : a += [x*x]
... 
>>> a
[1, 4, 9, 16, 25]

This approach has the perhaps undesirable side-effect of creating or re-assigning both a and x. It would be convenient if we could do something like:

[for x in range(1,6) :  [x*x]] # This produces SyntaxError: invalid syntax.

If we change the syntax slightly, List Comprehensions come to the rescue.

List Comprehensions

edit

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.


Consider the task mentioned above, to create a list of squares of numbers in range(1,6). Modify the syntax slightly, and put it into the form of a List Comprehension or "listcomp":

>>> [x*x for x in range(1,6)]
[1, 4, 9, 16, 25]
>>>

In the listcomp above, x is local to the listcomp, and the final bracket ']' tells python that it should expect no more input.


A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list. Consider some examples:


Given:

>>> a = list(range(-5,7)) ; a
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>>

Create a deep copy of list a:

edit
>>> b = [p for p in a] ; b
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>> a is b
False # a and b are different entities.
>>>

Display the position and value of all odd numbers in list a:

edit
>>> [(i,a[i]) for i in range(len(a)) if a[i] % 2]
[(0, -5), (2, -3), (4, -1), (6, 1), (8, 3), (10, 5)]
>>>

The listcomp above is equivalent to:

b = [
 ( 0,  a[0]),
#( 1,  a[1]), not included. a[1] is even.                                                                               
 ( 2,  a[2]),
#( 3,  a[3]), not included. a[3] is even.                                                                               
 ( 4,  a[4]),
#( 5,  a[5]), not included. a[5] is even.                                                                               
 ( 6,  a[6]),
#( 7,  a[7]), not included. a[7] is even.                                                                               
 ( 8,  a[8]),
#( 9,  a[9]), not included. a[9] is even.                                                                               
 (10, a[10]),
#(11, a[11]) not included. a[11] is even.                                                                               
]

print ('b =', b)
b = [(0, -5), (2, -3), (4, -1), (6, 1), (8, 3), (10, 5)]

Display all numbers in list a that are an exact integer power of 2:

edit
>>> [num for num in a for exponent in range(5) if num == 2 ** exponent ]
[1, 2, 4]
>>> 
>>> [ # Same listcomp again with some formatting.
... num
...     for num in a 
...         for exponent in range(5)
...   if num == 2 ** exponent ] # Between the brackets white space is almost irrelevant.
[1, 2, 4]
>>>

The above listcomp is equivalent to:

>>> b = []
>>> a = list(range(-5,7)) ; a
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6]
>>> for num in a :
...     for exponent in range(5) :
...          if num == 2 ** exponent : b += [num]
... 
>>> b
[1, 2, 4]
>>>

Apply a math function to all elements:

edit
>>> [(2*x*x + 3*x - 2) for x in a]
[33, 18, 7, 0, -3, -2, 3, 12, 25, 42, 63, 88]
>>>

Create a list of random positive integers:

edit
import random

b = [
random.getrandbits(p+q-r)
    for p in range(10,20,4)
    for q in range(3,17,5)
    for r in range(4,16,7)
        if ((p+q-r) > 12) and ((p+q-r) % 3)
]

print ('b =', b)

Successive invocations of the above produce:

b = [1028, 193714, 7439, 6543953, 54358, 116875, 3156193, 644588]
b = [1665, 267591, 813, 6463934, 35700, 112421, 2943558, 640640]
b = [4370, 330632, 8045, 5037055, 5526, 44136, 2528805, 612774]
b = [14764, 21409, 8140, 3436253, 55785, 96832, 1355388, 162655]

Call a method on each element:

edit
>>> fruits = ['  apples  ', '  pears   ', ' grapes   ', '  WaterMELons   ', ' peaches  ' ]
>>> [fruit.strip().lower() for fruit in fruits ]
['apples', 'pears', 'grapes', 'watermelons', 'peaches']
>>>

To flatten a list:

edit
>>> L1 = [32, 97, [192, 128], 98, 99, 32, [192, 128], 32]
>>>
>>> [num for elem in L1 for num in elem]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
TypeError: 'int' object is not iterable
>>> 
>>> L1a = [ ([p], p)[isinstance(p, list)] for p in L1 ] ; L1a
[[32], [97], [192, 128], [98], [99], [32], [192, 128], [32]]
>>>
>>> [num for elem in L1a for num in elem]
[32, 97, 192, 128, 98, 99, 32, 192, 128, 32]

Nested List Comprehensions

edit

Create a useful list:

matrix = [p for p in range (3,23)] # A listcomp does this nicely.

print ('len(matrix) =', len(matrix))
print ('matrix =', matrix)
len(matrix) = 20
matrix = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]

New list of 3 rows

edit

Create a list containing three rows in which each row contains every third value of matrix.

b = [ [matrix[p] for p in range(q,len(matrix),3)] for q in range(3) ]
print ('''
b =
{}, length={}
{}, length={}
{}, length={}
'''.format( b[0],len(b[0]),  b[1],len(b[1]),  b[2],len(b[2]) ))
b =
[3, 6, 9, 12, 15, 18, 21], length=7
[4, 7, 10, 13, 16, 19, 22], length=7
[5, 8, 11, 14, 17, 20], length=6

The nested listcomp above is equivalent to:

b = [ [matrix[p] for p in range(0,len(matrix),3)] ,
      [matrix[p] for p in range(1,len(matrix),3)] ,
      [matrix[p] for p in range(2,len(matrix),3)] ]

Fill each row as necessary so that all rows have same length:

c = [ b[p]+[None]*q for p in range(len(b)) for q in [ len(b[0]) - len(b[p]) ] ]

The syntax of the listcomp requires a for statement. The second for statement above assigns a specific value to q, either 0 or 1.

print ('''
c = 
{}, length={}
{}, length={}
{}, length={}
'''.format( c[0],len(c[0]),  c[1],len(c[1]),  c[2],len(c[2]) ))
c =
[3, 6, 9, 12, 15, 18, 21], length=7
[4, 7, 10, 13, 16, 19, 22], length=7
[5, 8, 11, 14, 17, 20, None], length=7

Create a dictionary

edit

Create a dictionary from a list in which each value at an even position is a key and each value at an odd position is the associated value:

data = [p%q for p in range (3,10) for q in range (2,7)]
print ('data =', data)
data = [1, 0, 3, 3, 3, 0, 1, 0, 4, 4, 1, 2, 1, 0, 5, 0, 0, 2, 1, 0, 1, 1, 3, 2, 1, 0, 2, 0, 3, 2, 1, 0, 1, 4, 3]

Create a list containing two rows in which the first row contains keys and the second contains values.

b = [ [data[p] for p in range(q,len(data),2)] for q in range(2) ]
print ('''
b =
{}, length={}
{}, length={}
'''.format( b[0],len(b[0]),  b[1],len(b[1]) ))
b =
[1, 3, 3, 1, 4, 1, 1, 5, 0, 1, 1, 3, 1, 2, 3, 1, 1, 3], length=18
[0, 3, 0, 0, 4, 2, 0, 0, 2, 0, 1, 2, 0, 0, 2, 0, 4], length=17

Fill as necessary:

c = [ b[p]+[None]*q for p in range(len(b)) for q in range(len(b[0]) - len(b[p]), 1+len(b[0]) - len(b[p])) ]
print ('''
c = 
{}, length={}
{}, length={}
'''.format( c[0],len(c[0]),  c[1],len(c[1]) ))
c =
[1, 3, 3, 1, 4, 1, 1, 5, 0, 1, 1, 3, 1, 2, 3, 1, 1, 3], length=18
[0, 3, 0, 0, 4, 2, 0, 0, 2, 0, 1, 2, 0, 0, 2, 0, 4, None], length=18

Transpose rows and columns and create input to dictionary:

d = [[row[i] for row in c] for i in range(len(c[0]))]

print (' d =', d);
 d = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]

List d is equivalent to:

d1 = [ [ c[0][i], c[1][i] ] for i in range(18) ]

print ('d1 =', d1)
d1 = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]

List d1 is equivalent to:

d2 = [
 [ c[0][ 0], c[1][ 0] ] ,
 [ c[0][ 1], c[1][ 1] ] ,
 [ c[0][ 2], c[1][ 2] ] ,
 ........................
 [ c[0][15], c[1][15] ] ,
 [ c[0][16], c[1][16] ] ,
 [ c[0][17], c[1][17] ]	
]

print ('d2 =', d2)
d2 = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
e = dict(d)

print ('e =', e);
e = {1: 4, 3: None, 4: 4, 5: 0, 0: 2, 2: 0}

Listcomps simplified

edit

In practice this code will do the job:

data = [p%q for p in range (3,10) for q in range (2,7)]
print ('data =', data)
data = [1, 0, 3, 3, 3, 0, 1, 0, 4, 4, 1, 2, 1, 0, 5, 0, 0, 2, 1, 0, 1, 1, 3, 2, 1, 0, 2, 0, 3, 2, 1, 0, 1, 4, 3]

Check the input:

edit
if isinstance(data,list) and len(data) >= 1: pass # Input must contain at least one key.
else : exit(99)

status = {
((isinstance(data[p], int)) or (isinstance(data[p], float)) or (isinstance(data[p], str)))
for p in range(0, len(data), 2)
} # A set comprehension. In this dictionary each key must be int or float or str.

print ('status =', status)
status = {True}
if False in status : 
    print ("'data' contains unrecognized key.")
    exit (98)

Create the dictionary

edit
b = dict(
[
    (data+[None])[p:p+2] for p in range(0, len(data), 2)
]
)

print (
'\nlen(data) =',  len(data),
'\nInput to dict()\n   =', [ (data+[None])[p:p+2] for p in range(0, len(data), 2) ],
'\nDictionary =', b)
len(data) = 35
Input to dict()
   = [[1, 0], [3, 3], [3, 0], [1, 0], [4, 4], [1, 2], [1, 0], [5, 0], [0, 2], [1, 0], [1, 1], [3, 2], [1, 0], [2, 0], [3, 2], [1, 0], [1, 4], [3, None]]
Dictionary = {1: 4, 3: None, 4: 4, 5: 0, 0: 2, 2: 0}

List Comprehensions for free-format Python

edit

Although a listcomp recognizes statements beginning with only for or if, with a little dexterity the equivalent of assignments and else statements can be contained within a listcomp. The advantage is that a listcomp accepts free-format Python.

A "Unix date" has format

$ date
Wed Feb 14 08:24:24 CST 2018

The code in this section uses list comprehensions to recognize valid dates. All of the following are considered valid dates:

Wed Feb 14 08:24:24 CST 2018
Wednes Feb 14 08:24:24 CST 2018 # More than 3 letters in name of day.
Wed Febru 14 08:24:24 CST 2018 # More than 3 letters in name of month.
Wed Feb 14 8:24 : 24 CST 2018 # White space in hh:mm:ss.
wed FeB 14 8:24 : 24 cSt 2018 # Bad punctuation.

Build dictionary months:

mo = '''January February March April 
May June July August September 
October November December 
'''

L1 = [
[ month[:3], {month[:p] for p in range (3,len(month)+1)} ]
for month in mo.title().split()
]

months = dict(L1)

Display dictionary months:

L1 = [
'''months['{}'] = {}'''.format(key, months[key])
for key in months
]

print ( '\n'.join(L1) )
months['Jan'] = {'Januar', 'Janua', 'Janu', 'Jan', 'January'}
months['Feb'] = {'Februa', 'Febru', 'Februar', 'Feb', 'Febr', 'February'}
months['Mar'] = {'Mar', 'March', 'Marc'}
months['Apr'] = {'April', 'Apri', 'Apr'}
months['May'] = {'May'}
months['Jun'] = {'June', 'Jun'}
months['Jul'] = {'Jul', 'July'}
months['Aug'] = {'Augus', 'Augu', 'Aug', 'August'}
months['Sep'] = {'Sep', 'Septemb', 'September', 'Septem', 'Septe', 'Septembe', 'Sept'}
months['Oct'] = {'Octo', 'Octobe', 'Oct', 'Octob', 'October'}
months['Nov'] = {'Nove', 'November', 'Novemb', 'Novem', 'Nov', 'Novembe'}
months['Dec'] = {'Decembe', 'Decemb', 'Dece', 'Decem', 'December', 'Dec'}

Build dictionary days:

da='''  
Sunday Monday Tuesday 
Wednesday Thursday 
Friday Saturday 
'''

L1 = [
[ day[:3], {day[:p] for p in range (3,len(day)+1)} ]
for day in da.title().split()
]

days = dict(L1)

Display dictionary days:

L1 = [
'''days['{}'] = {}'''.format(key, days[key])
for key in days
]

print ( '\n'.join(L1) )
days['Sun'] = {'Sund', 'Sunda', 'Sun', 'Sunday'}
days['Mon'] = {'Monday', 'Mon', 'Mond', 'Monda'}
days['Tue'] = {'Tuesda', 'Tues', 'Tuesday', 'Tue', 'Tuesd'}
days['Wed'] = {'Wednesda', 'Wednesd', 'Wedn', 'Wednes', 'Wed', 'Wedne', 'Wednesday'}
days['Thu'] = {'Thursday', 'Thur', 'Thu', 'Thursd', 'Thurs', 'Thursda'}
days['Fri'] = {'Friday', 'Fri', 'Frida', 'Frid'}
days['Sat'] = {'Saturday', 'Satu', 'Saturda', 'Sat', 'Saturd', 'Satur'}

The regular expression:

reg2 = (
r'''\b # Word boundary.                           
(?P<day>\w{3,}) # At least 3 word characters.     
\s+                                               
(?P<month>\w{3,}) # At least 3 word characters.   
\s+                                               
(?P<date>\d{1,2}) # 1 or 2 numbers.               
\s+                                               
(?P<hours>\d{1,2}) #  1 or 2 numbers.             
\s*:\s*                                           
(?P<minutes>\d{1,2}) #  1 or 2 numbers.           
\s*:\s*                                           
(?P<seconds>\d{1,2}) #  1 or 2 numbers.           
\s+                                               
(?P<time_zone>\w{3}) # 3 word characters          
\s+                                               
(?P<year>\d{4}) # 4 numbers                       
\b # Word boundary.'''
)

Dictionary that contains number of days per month:

d1 = dict ((
    ('Jan', 31),    ('May', 31),    ('Sep', 30),
    ('Feb', 28),    ('Jun', 30),    ('Oct', 31),
    ('Mar', 31),    ('Jul', 31),    ('Nov', 30),
    ('Apr', 30),    ('Aug', 31),    ('Dec', 31),
))

List all valid dates in string dates below.

dates = ''' 
MON Februar 12 0:30 : 19 CST 2018 
Tue    Feb  33      00:30:19       CST      2018 # Invalid.
Wed    Feb     29   00:30:19       CST      1900 # Invalid.  
Thursda               feb             29                  00:30:19           CST            1944    
'''

The list comprehension that does it all:

L1 = [
'\n'.join(( str(m), m[0], str(m.groupdict()) ))

for m in re.finditer(reg2, dates, re.IGNORECASE|re.VERBOSE|re.ASCII)

for day in (m['day'].title(),) # Equivalent to assignment: day = m['day'].title()
if day[:3] in days
if day in days[day[:3]]

for month in ( m['month'].title() ,)
if month[:3] in months
if month in months[month[:3]]

for date in ( int(m['date']) ,) if date >= 1
for hours in ( int(m['hours']) ,) if hours <= 23
for minutes in ( int(m['minutes']) ,) if minutes <= 59
for seconds in ( int(m['seconds']) ,) if seconds <= 59
for zone in (m['time_zone'] ,) if zone.upper() in ('EST', 'EDT', 'CST', 'CDT', 'MST', 'MDT', 'PST', 'PDT' )

for year in ( int(m['year']) ,) if year >= 1900 and year <= 2020

for leap_year in (        # 'else' in a listcomp  
    (                     # equivalent to:  
        year % 4 == 0,    # if year % 100 == 0:  
        year % 400 == 0   #     leap_year = year % 400 == 0  
    )[year % 100 == 0]    # else :  
,)                        #     leap_year = year % 4 == 0  

for max_date in (                         # if (month[:3] == 'Feb') and leap_year :   
    (                                     #     max_date = 29  
        d1[month[:3]],                    # else : 
        29                                #     max_date = d1[month[:3]]  
    )[(month[:3] == 'Feb') and leap_year] # 
,)

if date <= max_date
]

print ( '\n\n'.join(L1) )
<_sre.SRE_Match object; span=(2, 35), match='MON Februar 12 0:30 : 19 CST 2018'>
MON Februar 12 0:30 : 19 CST 2018
{'day': 'MON', 'month': 'Februar', 'date': '12', 'hours': '0', 'minutes': '30', 'seconds': '19', 'time_zone': 'CST', 'year': '2018'}

<_sre.SRE_Match object; span=(159, 255), match='Thursda               feb             29         >
Thursda               feb             29                  00:30:19           CST            1944
{'day': 'Thursda', 'month': 'feb', 'date': '29', 'hours': '00', 'minutes': '30', 'seconds': '19', 'time_zone': 'CST', 'year': '1944'}

Assignments

edit
 

Further Reading or Review

edit

References

edit

1. Python's documentation:

"5.1.3. List Comprehensions", "5.1.4. Nested List Comprehensions"


2. Python's methods:


3. Python's built-in functions: