Python descriptors made simple

Descriptors, introduced in Python 2.2, provide a way to add managed attributes to objects. They are not used much in everyday programming, but it’s important to learn them to understand a lot of the “magic” that happens in the standard library and third-party packages.

The problem

Imagine we are running a bookshop with an inventory management system written in Python. The system contains a class called Book  that captures the author, title and price of physical books.

Our simple Book class works fine for a while, but eventually bad data starts to creep into the system. The system is full of books with negative prices or prices that are too high because of data entry errors. We decide that we want to limit book prices to values between 0 and 100. In addition, the system contains a Magazine class that suffers from the same problem, so we want our solution to be easily reusable.

This tutorial is pretty long. Want a PDF?

Just type in your email address and I'll send a PDF version to your inbox.

Powered by ConvertKit

The descriptor protocol

The descriptor protocol is simply a set of methods a class must implement to qualify as a descriptor. There are three of them:

  • __get__(self, instance, owner)
  • __set__(self, instance, value)
  • __delete__(self, instance)

__get__ accesses a value stored in the object and returns it.

__set__ sets a value stored in the object and returns nothing.

__delete__ deletes a value stored in the object and returns nothing.

Using these methods, we can write a descriptor called Price that limits the value stored in it to between 0 and 100.

A few details in the implementation of Price deserve mentioning.

An instance of a descriptor must be added to a class as a class attribute, not as an instance attribute. Therefore, to store different data for each instance, the descriptor needs to maintain a dictionary that maps instances to instance-specific values. In the implementation of Price, that dictionary is self.values.

A normal Python dictionary stores references to objects it uses as keys. Those references by themselves are enough to prevent the object from being garbage collected. To prevent Book instances from hanging around after we are finished with them, we use the WeakKeyDictionary from the weakref standard module. Once the last strong reference to the instance passes away, the associated key-value pair will be discarded.

Using descriptors

As we saw in the last section, descriptors are linked to classes, not to instances, so to add a descriptor to the Book class, we must add it as a class variable.

The price constraint for books is now enforced.

How descriptors are accessed

So far we’ve managed to implement a working descriptor that manages the price attribute on our Book class, but how it works might not be clear. It all feels a bit too magical, but not to worry. It turns out that descriptor access is quite simple:

  • When we try to evaluate b.price and retrieve the value, Python recognizes that price is a descriptor and calls Book.price.__get__.
  • When we try to change the value of the price attribute, e.g. b.price = 23 , Python again recognizes that price is a descriptor and substitutes the assignment with a call to Book.price.__set__.
  • And when we try to delete the price attribute stored against an instance of Book, Python automatically interprets that as a call to Book.price.__delete__.

The number 1 descriptor gotcha

Unless we fully understand the fact that descriptors are linked to classes and not to instances, and therefore need to maintain their own mapping of instances to instance-specific values, we might be tempted to write the Price descriptor as follows:

But once we start instantiating multiple Book instances, we’re going to have a problem.

The key is to understand that there is only one instance of Price for Book, so every time the value in the descriptor is changed, it changes for all instances. That behaviour in itself is useful for creating managed class attributes, but it is not what we want in this case. To store separate instance-specific values, we need to use the WeakRefDictionary.

The property built-in function

Another way of building descriptors is to use the property built-in function. Here is the function signature:

fget, fset and fdel are methods to get, set and delete attributes, respectively. doc is a docstring.

Instead of defining a single class-level descriptor object that manages instance-specific values, property works by combining instance methods from the class. Here is a simple example of a Publisher class from our inventory system with a managed name property. Each method passed into property has a print statement to illustrate when it is called.

If we make an instance of Publisher and access the name attribute, we can see the appropriate methods being called.

That’s it for this basic introduction to descriptors. If you want a challenge, take what you have learned and try to reimplement the @property decorator. There is enough information in this post to allow you to figure it out.

A quick guide to nonlocal in Python 3

Python 3 introduced the nonlocal  keyword that allows you to assign to variables in an outer, but non-global, scope. An example will illustrate what I mean.

msg  is declared in the outside function and assigned the value "Outside!". Then, in the inside function, the value "Inside!" is assigned to it. When we run outside, msg has the value "Inside!" in the inside function, but retains the old value in the outside function.

We see this behaviour because Python hasn’t actually assigned to the existing msg variable, but has created a new variable called msg in the local scope of inside that shadows the name of the variable in the outer scope.

Preventing that behaviour is where the nonlocal keyword comes in.

Now, by adding nonlocal msg to the top of inside, Python knows that when it sees an assignment to msg, it should assign to the variable from the outer scope instead of declaring a new variable that shadows its name.

The usage of nonlocal is very similar to that of global, except that the former is used for variables in outer function scopes and the latter is used for variable in the global scope.

Some confusion might arise about when nonlocal should be used. Take the following function, for instance.

It would be reasonable to expect that without using nonlocal the insertion of the "inside": 2 key-value pair in the dictionary would not be reflected in outside. Reasonable, but incorrect, because the dictionary insertion is not an assignment, but a method call. In fact, inserting a key-value pair into a dictionary is equivalent to calling the __setitem__ method on the dictionary object.

I will leave it there for now. If you want to learn more about the nonlocal keyword, check out PEP 3104.

The two ways to sort a list in Python

Today I’m going to take a look at another element of the language that tends to trip up Python beginners – the difference between sorted(my_list)  and my_list.sort().

The built-in function sorted sorts the list that is passed into it, and returns a new list while preserving the old one.

On the other hand, the sort method on list objects sorts the list in place, destroying the original ordering.

Using a list’s sort method is the equivalent assigning the output of sorted back to the original list.

However, that particular way of doing things is frowned upon. Only use sorted

sorted and list.sort both accept the key and reverse parameters. The cmp parameter, which allowed you to pass in a custom comparator function, has been removed in Python 3. key should be used instead.

The difference between range and xrange in Python

Today I’m going to take a look at another difference between Python 2 and 3 that can trip up people making the switch. Python 2 used to have two functions that could be used to iterate a certain number of times in for  loops, range  and xrange . In Python 3, there is no xrange , but the range  function behaves like xrange  in Python 2.

The way things were

You probably remember that in Python 2 you could generate indexes in for  loops in two ways:

The difference between these two built in functions is not immediately obvious when used in this way. Let’s take a look at the output of each function in the interactive interpreter.

As you can see, range  returns a normal list , but xrange  returns an xrange  object. An xrange  object is similar to a generator: it produces the necessary index on demand instead of producing the entire list up front. Therefore it can be slightly faster and more memory efficient. According to the Python 2 documentation, the xrange  type offers the following guarantee:

The advantage of the xrange type is that an xrange object will always take the same amount of memory, no matter the size of the range it represents.

xrange deprecated in Python 3

In Python 3, xrange  has been removed and the only option for generating iterable sequences of consecutive numbers is range . Actually, it is more correct to say that the Python 2 range  function has been removed and xrange  has been renamed to range .

For the most part, this change is easy to handle: just use range  when you would have used either range  or xrange  in Python 2. The only place you might be tripped up is if you actually need the list  that range  used to return. Luckily, all you have to do in that case is pass the Python 3 range  object to the list  constructor function.

Pythonic iteration

Before I finish, I’ll just mention a way to make your code more Pythonic. Quite frequently, when people want to use any of the range functions, it is because they want a way to index another sequence type, e.g.

Sometimes people even declare a counter variable outside the loop, just so they have an index.

There is no need to do either of these things. In particular, that range(len(seq))  idiom is one of classic markers of amateur Python code. What you really need is the enumerate  function, which automatically generates an index for whatever sequence you are iterating over.

Ta-da! Once you start using enumerate , you’ll never go back.

Private methods and attributes in Python

Unlike Java, which enforces access restrictions on methods and attributes, Python takes the view that we are all adults and should be allowed to use the code as we see fit. Nevertheless, the language provides a few facilities to indicate which methods and attributes are public and which are private, and some ways to dissuade people from accessing and using private things.

Normal attribute access

Let’s take a look at how normal attribute access works.

As we can see, there are no restrictions on accessing or assigning to the bar  attribute of our instance. The attribute is also included in __dict__ .

Making it private

Now let’s make bar  “private”. We can do that by adding two leading underscores to the name.

What has happened here is that the name of __bar  has been changed by the  interpreter so that it is not easily accessible outside the class. If we take a look at __dict__  again, we will see that it has been renamed to _Foo__bar , and can be accessed and assigned using that name.

This is called “name mangling”. Attributes whose names start with two underscores are renamed in the format _classname__attrname .

We only have to use the mangled name outside the class. Inside, we access the attribute in the normal way.

Getters and setters

After learning about “private” attributes, sometimes new Python programmers get the idea that they can use getters and setters to manage accessing and assigning attributes, so they write something like this.

It might work, but it’s not Python. Direct attribute access is the natural and Pythonic way to do things, so any solution to mediated attribute access should maintain that interface. There are a few ways to do it, such as overriding __getattr__  and __setattr__ , but the best way is to use managed attributes.

Here we have created a managed bar  attribute that stores its data in the private __bar  attribute. When getting and setting the value of __bar , we can run whatever code we want for validation, logging, etc., provided we go through the interface provided by the two decorated bar  functions. Useful, eh?

Private methods

Methods can be made private in the same way, by naming them with two leading underscores and no trailing underscores.

And just like private attributes, they are accessible by name inside the class.

A word about single underscores

So far we have dealt with names that start with two underscores, but it’s quite common to see names that start with a single underscore. They are not private in the same sense. Name mangling does not occur. A single underscore is mostly just a weak indication that the thing in question is meant to be used internally and is not part of the public interface of the class, module, etc., that it is inside.

In classes, attributes and methods that start with a single underscore are treated normally.

However, single underscores are not purely a stylistic thing. They do affect how the import  statement works.

PEP8 says:

_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.

This means that if we have a function called _hello_world  in a module called helloworld , and we import *  from it, then the _hello_world  function will not be pulled into the current scope.

It is possible to override the default hiding of objects with single leading underscores. __all__  is a list of the names of public objects exported by a module. If we add '_hello_world'  to the list, then it will be pulled in with the wildcard import.

The single underscore only affects wildcard imports, which we should avoid anyway. We can still grab the function specifically using from helloworld import _hello_world .

And that’s pretty much all you need to know about private attributes in Python!

Multi-line strings in Python

At some point, you will want to define a multi-line string and find that the obvious solutions just don’t feel clean. In this post, I’m going to take a look at three ways of defining them and give you my recommendation.

Concatenation

The first way of doing it, and the way that immediately comes to mind, is to just add the strings to each other, like so:

In my opinion, this looks extremely ugly. You can make it a bit better by omitting the + signs. This also works:

That’s better, but still a bit of an eyesore. Let’s move on.

Multi-line string syntax

Another way is to use the built-in multi-line string syntax, like so:

That’s much better than concatenation, but it has one conspicuous wart. You can’t indent the subsequent lines to be at the same level of indentation as the first one. The space will be interpreted as part of the string. The first line will be flush with the margin and the subsequent lines will be indented.

Tuple syntax

There is another way to do it that doesn’t suffer from the ugliness of concatenation or the indentation wart of the multi-line syntax, and that is to use the tuple syntax.

I have no idea why this works, but it does:

Note that you have to add the line breaks into the strings, because they’re not put in automatically. Nonetheless, I think this is by far the nicest, most readable method.

The difference between input and raw_input in Python

One of the first things that people notice when they ditch Python 2 and start coding in Python 3 – apart from the fact that print  is now a function – is that the raw_input  function has disappeared. So this Python 2 code:

must be converted to this in Python 3:

The change comes because the Python developers realized that they had made a dangerous mistake back in the early days. If you recall, the Python 2 version of the input  function used to be equivalent to this:

This allowed you to easily write programs that take input from the user and evaluate it as an int or a float or whatever type it is. For example:

raw_input, on the other hand, returned strings:

In Python 3, input  behaves like raw_input  in Python2, and the raw_input  function does not exist, so you have to do something like this (assuming you want to accept integer values):

Effectively, in the Python 2 version of the input  function, the string read from the prompt was evaled. To understand the danger of eval , you should take a look at this article by Ned Batchelder.

Automatically evaling whatever anybody decides to type at the prompt maybe makes things a little easier in a teaching context, because students don’t have to learn to convert strings to their intended types, but it also leaves the program open to executing arbitrary code that the user types in, revealing private information about your system or damaging it in some way.

Take this for example:

That will print the current working directory of your program.

Or if someone really wants to screw things up for you, they could just execute a recursive delete of your home directory. DO NOT RUN THIS CODE:

There is no need for Python to contain such footguns, regardless of their dubious teaching value.

To get the old behaviour of input  (which I hope I have convinced you that you do not want), replace your calls to it with eval(input()) . In fact, that is exactly what the automatic porting tool 2to3  does.

How to modify a list in place in Python

Did you ever see a piece of code that looks like this?

The purpose of it is to remove any even numbers from the list numbers . And it looks like it should work, right?

The problem explained

For most of you, the mistake will already be obvious, but beginners can expect to scratch their heads when they examine the list and see that it still has some even numbers in it.

Here’s what we wanted to see:

What is going on here? We can illuminate the problem if we write out the code without any of the syntactic niceties of the Python for  loop. (Keep in mind that this is merely for explanatory purposes and is not the kind of code you should be writing.)

Be your own debugger

Let’s “step through” the while  loop to see exactly how the execution goes wrong.

  • On the first iteration, the loop counter  i  is equal to 0. 1 (the value of the first element in numbers ) is assigned to elem . 1 is not divisible by 2 so the if  block is not called.
  • On the second iteration, i  is equal to 1. 2 (the value of the second element in numbers) is assigned to elem . 2 is clearly divisible by 2, so the if  block is called and the second element is removed from the list. The length of the list is reduced by 1. The element that was at the third position is now at the second position and so on. The list now looks like this.

  • On the third iteration, i  is equal to 2. 3 (the value of the third element in numbers , is assigned to elem . The 2 that used to be the third element, but is now the second, has been skipped entirely. It can never be removed from the list.
  • The 4 in the sixth position of the initial list is skipped in the same way.

Now that you understand how the code is going wrong, let’s see how to fix it.

How to fix it

There are several ways to refactor the code to give us the output we want. Let’s examine them.

First, we could use a list comprehension to build up a new list that contains only the elements in numbers  that are not divisible by 2. This is by far the simplest and most common way to achieve what we want. Here’s what it looks like:

A subtlety of this method is that it creates a completely new list and makes numbers point to it. If you had another name pointing to the old list, it will still point there. Here’s what I mean:

That may or may not be a problem, depending on the program, but if it is crucial to modify the list in place without copying anything, we can iterate backwards. While we’re at it, we can use del with an index instead of remove, which needlessly searches through the list for the first element with the given value:

So which one of these options should you use? If you are ok with creating a completely new list, then you should use the list comprehension. It’s by far the clearest and most idiomatic solution. Otherwise, if modifying the original list in place is crucial, you should go for the second version.