List, Dict and Set Comprehensions By Example

One type of syntactic sugar that sets Python apart from more verbose languages is comprehensions. Comprehensions are a special notation for building up lists, dictionaries and sets from other lists, dictionaries and sets, modifying and filtering them in the process.

They allow you to express complicated looping logic in a tiny amount of space.

List Comprehensions

List comprehensions are the best known and most widely used. Let’s start with an example.

A common programming task is to iterate over a list and transform each element in some way, e.g:

That’s the kind of thing you might do if you were a Java programmer. Luckily for us, though, list comprehensions allow the same idea to be expressed in much fewer lines.

The basic syntax for list comprehensions is this: [EXPRESSION FOR ELEMENT IN SEQUENCE].

Another common task is to filter a list and create a new list composed of only the elements that pass a certain condition. The next snippet constructs a list of every number from 0 to 9 that has a modulus with 2 of zero, i.e. every even number.

Using an IF-ELSE construct works slightly differently to what you might expect. Instead of putting the ELSE at the end, you need to use the ternary operator – x if y else z.

The following list comprehension generates the squares of even numbers and the cubes of odd numbers in the range 0 to 9.

List comprehensions can also be nested inside each other. Here is how we can generate a two-dimensional list, populated with zeros. (I have wrapped the comprehension in pprint to make the output more legible.)

(As you have probably noticed, it is possible to create list comprehensions that are utterly illegible, so please think about who has to touch your code after you and exercise some restraint.)

On the other hand, the syntax of basic comprehensions might seem complicated to you now, but I promise that with time it will become second nature.

Generator Expressions

A list comprehension creates an entire list in memory. In many cases, that’s what you want because you want to iterate over the list again or otherwise manipulate after it has been created. In other cases, however, you don’t want the list at all. Generator expression – described in PEP 289 – were added for this purpose.

Let’s say you want to calculate the sum of the squares of a range of numbers. Without generator expressions, you would do this:

That creates a list in memory just to throw it away once the reference to it is no longer needed, which is wasteful. Generator expressions are essentially a way to define an anonymous generator function and calling it, allowing you to ditch the square brackets and write this:

They are also useful for other aggregate functions like min, max.

The set and dict constructors can take generator expressions too:

Dict Comprehensions

On top of list comprehensions, Python now supports dict comprehensions, which allow you to express the creation of dictionaries at runtime using a similarly concise syntax.

A dictionary comprehension takes the form {key: value for (key, value) in iterable}. This syntax was introduced in Python 3 and backported as far as Python 2.7, so you should be able to use it regardless of which version of Python you have installed.

A canonical example is taking two lists and creating a dictionary where the item at each position in the first list becomes a key and the item at the corresponding position in the second list becomes the value.

(Look how jumbled up it is. A reminder that dicts have no natural ordering.)

The zip function used inside this comprehension returns an iterator of tuples, where each element in the tuple is taken from the same position in each of the input iterables. In the example above, the returned iterator contains the tuples (“a”, 1), (“b”, 2), etc.

Any iterable can be used in a dict comprehension, including strings. The following code might be useful if you wanted to generate a dictionary that stores letter frequencies, for instance.

(The code above is just an example of using a string as an iterable inside a comprehension. If you really want to count letter frequencies, you should check out collections.Counter.)

Dict comprehensions can use complex expressions and IF-ELSE constructs too. This one maps the numbers in a specific range to their cubes:

And this one omits cubes that are not divisible by 4:

Set Comprehensions

A set is an unordered collection of elements in which each element can only appear once. Although sets have existed in Python since 2.4, Python 3 introduced the set literal syntax.

Python 3 also introduced set comprehensions.

Prior to this, you could use the set built-in function.

The syntax for set comprehensions is almost identical to that of list comprehensions, but it uses curly brackets instead of square brackets. The pattern is {EXPRESSION FOR ELEMENT IN SEQUENCE}.

The result of a set comprehension is the same as passing the output of the equivalent list comprehension to the set function.

That’s it for the theory. Now let’s dissect some examples of comprehensions.

Examples

List of files with the .png extension

The os module contains a function called listdir that returns a list of filenames in a given directory. We can use the endswith method on the strings to filter the list of files.

Here it is in usage:

Merge two dictionaries

Merging two dictionaries together can be achieved easily in a dict comprehension:

Here is merge_dicts in action:

Sieve of Eratosthenes

The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. You might remember it from school. It works like this:

  • Starting at 2, which is the first prime number, exclude all multiples of 2 up to n.
  • Move on to 3. Exclude all multiples of 3 up to n.
  • Keep going like that until you reach n.

And here’s the code:

The first thing to note about the function is the use of a double loop in the first set comprehension. Contrary to what you might expect, the leftmost loop is the outer loop and the rightmost loop is the inner loop. The pattern for double loops in list comprehensions is [x for b in a for x in b].

In case you hadn’t seen it before, the third argument in the rightmost call to range represents the step size.

It would be possible to use a list comprehension for this algorithm, but the not_primes list would be filled with duplicates. It is better to use the automatical deduplication behaviour of the set to avoid that.

Exercises

I’ve included some exercises to help you solidify your new knowledge of comprehensions.

1. Write a function called generate_matrix that takes two positional arguments – m and n – and a keyword argument default that specifies the value for each position. It should use a nested list comprehension to generate a list of lists with the given dimensions. If default is provided, each position should have the given value, otherwise the matrix should be populated with zeroes.

2. Write a function called initcap that replicates the functionality of the string.title method, except better. Given a string, it should split the string on whitespace, capitalize each element of the resulting list and join them back into a string. Your implementation should use a list comprehension.

3. Write a function called make_mapping that takes two lists of equal length and returns a dictionary that maps the values in the first list to the values in the second. The function should also take an optional keyword argument called exclude, which expects a list. Values in the list passed as exclude should be omitted as keys in the resulting dictionary.

4. Write a function called compress_dict_keys that takes a dictionary with string keys and returns a new dictionary with the vowels removed from the keys. For instance, the dictionary {"foo": 1, "bar": 2} should be transformed into {"f": 1, "br": 2}. The function should use a list comprehension nested inside a dict comprehension.

5. Write a function called dedup_surnames that takes a list of surnames names and returns a set of surnames with the case normalized to uppercase. For instance, the list ["smith", "Jones", "Smith", "BROWN"] should be transformed into the set {"SMITH", "JONES", "BROWN"}.

Solutions

1. Nest two list comprehensions to generate a 2D list with m rows and n columns. Use default for the value in each position in the inner comprehension.

2. Disassemble the sentence passed into the function using split, then call capitalize on each word, then use join to reassemble the sentence.

3. Join the two lists a and b using zip, then use the zipped lists in the dictionary comprehension.

4. Iterate over the key-value pairs from the passed-in dictionary and, for each key, remove the vowels using a comprehension with an IF construct.

5. Use the set comprehension syntax (with curly brackets) to iterate over the given list and call upper on each name in it. The deduplication will happen automatically due to the nature of the set data structure.

I’ll leave it there for now. If you’ve worked your way through this post and given the exercises a good try, you should be ready to use comprehensions in your own code.

If you’ve got any questions or other remarks, let me know in the comments.

How exactly do context managers work?

Context managers (PEP 343) are pretty important in Python. You probably use one every time you open a file:

But how well do you understand what’s going on behind the scenes?

Context manager classes

It’s actually quite simple. A context manager is a class that implements an __enter__ and an __exit__ method.

Let’s imagine you want to you print a line of text to the console surrounded with asterisks. Here’s a context manager to do it:

The __exit__ method takes three arguments apart from self. Those arguments contain information about any errors that occurred inside the with block.

You can use asterisks in the same way as any of the built-in context managers:

Accessing the context inside the with block

If you need to get something back and use it inside the with block – such as a file descriptor – you simply return it from __enter__:

myopen works identically to the built-in open:

The contextmanager decorator

Thankfully, you don’t have to implement a class every time. The contextlib package has a contextmanager decorator that you can apply to generators to automatically transform them into context managers:

The code before yield corresponds to __enter__ and the code after yield corresponds to __exit__. A context manager generator should have exactly one yield in it.

It works the same as the class version:

Roll your own contextmanager decorator

The implementation in contextlib is complicated, but it’s not hard to write something that works similarly with the exception of a few edge cases:

It’s not as robust as the real implementation, but it should be understandable. Here are the key points:

  • The inner function instantiates a copy of the nested CMWrapper class with a handle on the generator passed into the decorator.
  • __enter__ calls next() on the generator and returns the yielded value so it can be used in the with block.
  • __exit__ calls next() again and catches the StopIteration exception that the generator throws when it finishes.

That’s it for now. If you want to learn more about context managers, I recommend you take a look at the code for contextlib.