List, Dict and Set Comprehensions By Example

One type of syntactic sugar that sets Python apart from more verbose languages is comprehensions. Comprehensions are a special notation for building up lists, dictionaries and sets from other lists, dictionaries and sets, modifying and filtering them in the process.

They allow you to express complicated looping logic in a tiny amount of space.

List Comprehensions

List comprehensions are the best known and most widely used. Let’s start with an example.

A common programming task is to iterate over a list and transform each element in some way, e.g:

That’s the kind of thing you might do if you were a Java programmer. Luckily for us, though, list comprehensions allow the same idea to be expressed in much fewer lines.

The basic syntax for list comprehensions is this: [EXPRESSION FOR ELEMENT IN SEQUENCE].

Another common task is to filter a list and create a new list composed of only the elements that pass a certain condition. The next snippet constructs a list of every number from 0 to 9 that has a modulus with 2 of zero, i.e. every even number.

Using an IF-ELSE construct works slightly differently to what you might expect. Instead of putting the ELSE at the end, you need to use the ternary operator – x if y else z.

The following list comprehension generates the squares of even numbers and the cubes of odd numbers in the range 0 to 9.

List comprehensions can also be nested inside each other. Here is how we can generate a two-dimensional list, populated with zeros. (I have wrapped the comprehension in pprint to make the output more legible.)

(As you have probably noticed, it is possible to create list comprehensions that are utterly illegible, so please think about who has to touch your code after you and exercise some restraint.)

On the other hand, the syntax of basic comprehensions might seem complicated to you now, but I promise that with time it will become second nature.

Generator Expressions

A list comprehension creates an entire list in memory. In many cases, that’s what you want because you want to iterate over the list again or otherwise manipulate after it has been created. In other cases, however, you don’t want the list at all. Generator expression – described in PEP 289 – were added for this purpose.

Let’s say you want to calculate the sum of the squares of a range of numbers. Without generator expressions, you would do this:

That creates a list in memory just to throw it away once the reference to it is no longer needed, which is wasteful. Generator expressions are essentially a way to define an anonymous generator function and calling it, allowing you to ditch the square brackets and write this:

They are also useful for other aggregate functions like min, max.

The set and dict constructors can take generator expressions too:

Dict Comprehensions

On top of list comprehensions, Python now supports dict comprehensions, which allow you to express the creation of dictionaries at runtime using a similarly concise syntax.

A dictionary comprehension takes the form {key: value for (key, value) in iterable}. This syntax was introduced in Python 3 and backported as far as Python 2.7, so you should be able to use it regardless of which version of Python you have installed.

A canonical example is taking two lists and creating a dictionary where the item at each position in the first list becomes a key and the item at the corresponding position in the second list becomes the value.

(Look how jumbled up it is. A reminder that dicts have no natural ordering.)

The zip function used inside this comprehension returns an iterator of tuples, where each element in the tuple is taken from the same position in each of the input iterables. In the example above, the returned iterator contains the tuples (“a”, 1), (“b”, 2), etc.

Any iterable can be used in a dict comprehension, including strings. The following code might be useful if you wanted to generate a dictionary that stores letter frequencies, for instance.

(The code above is just an example of using a string as an iterable inside a comprehension. If you really want to count letter frequencies, you should check out collections.Counter.)

Dict comprehensions can use complex expressions and IF-ELSE constructs too. This one maps the numbers in a specific range to their cubes:

And this one omits cubes that are not divisible by 4:

Set Comprehensions

A set is an unordered collection of elements in which each element can only appear once. Although sets have existed in Python since 2.4, Python 3 introduced the set literal syntax.

Python 3 also introduced set comprehensions.

Prior to this, you could use the set built-in function.

The syntax for set comprehensions is almost identical to that of list comprehensions, but it uses curly brackets instead of square brackets. The pattern is {EXPRESSION FOR ELEMENT IN SEQUENCE}.

The result of a set comprehension is the same as passing the output of the equivalent list comprehension to the set function.

That’s it for the theory. Now let’s dissect some examples of comprehensions.

Examples

List of files with the .png extension

The os module contains a function called listdir that returns a list of filenames in a given directory. We can use the endswith method on the strings to filter the list of files.

Here it is in usage:

Merge two dictionaries

Merging two dictionaries together can be achieved easily in a dict comprehension:

Here is merge_dicts in action:

Sieve of Eratosthenes

The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. You might remember it from school. It works like this:

  • Starting at 2, which is the first prime number, exclude all multiples of 2 up to n.
  • Move on to 3. Exclude all multiples of 3 up to n.
  • Keep going like that until you reach n.

And here’s the code:

The first thing to note about the function is the use of a double loop in the first set comprehension. Contrary to what you might expect, the leftmost loop is the outer loop and the rightmost loop is the inner loop. The pattern for double loops in list comprehensions is [x for b in a for x in b].

In case you hadn’t seen it before, the third argument in the rightmost call to range represents the step size.

It would be possible to use a list comprehension for this algorithm, but the not_primes list would be filled with duplicates. It is better to use the automatical deduplication behaviour of the set to avoid that.

Exercises

I’ve included some exercises to help you solidify your new knowledge of comprehensions.

1. Write a function called generate_matrix that takes two positional arguments – m and n – and a keyword argument default that specifies the value for each position. It should use a nested list comprehension to generate a list of lists with the given dimensions. If default is provided, each position should have the given value, otherwise the matrix should be populated with zeroes.

2. Write a function called initcap that replicates the functionality of the string.title method, except better. Given a string, it should split the string on whitespace, capitalize each element of the resulting list and join them back into a string. Your implementation should use a list comprehension.

3. Write a function called make_mapping that takes two lists of equal length and returns a dictionary that maps the values in the first list to the values in the second. The function should also take an optional keyword argument called exclude, which expects a list. Values in the list passed as exclude should be omitted as keys in the resulting dictionary.

4. Write a function called compress_dict_keys that takes a dictionary with string keys and returns a new dictionary with the vowels removed from the keys. For instance, the dictionary {"foo": 1, "bar": 2} should be transformed into {"f": 1, "br": 2}. The function should use a list comprehension nested inside a dict comprehension.

5. Write a function called dedup_surnames that takes a list of surnames names and returns a set of surnames with the case normalized to uppercase. For instance, the list ["smith", "Jones", "Smith", "BROWN"] should be transformed into the set {"SMITH", "JONES", "BROWN"}.

Solutions

1. Nest two list comprehensions to generate a 2D list with m rows and n columns. Use default for the value in each position in the inner comprehension.

2. Disassemble the sentence passed into the function using split, then call capitalize on each word, then use join to reassemble the sentence.

3. Join the two lists a and b using zip, then use the zipped lists in the dictionary comprehension.

4. Iterate over the key-value pairs from the passed-in dictionary and, for each key, remove the vowels using a comprehension with an IF construct.

5. Use the set comprehension syntax (with curly brackets) to iterate over the given list and call upper on each name in it. The deduplication will happen automatically due to the nature of the set data structure.

I’ll leave it there for now. If you’ve worked your way through this post and given the exercises a good try, you should be ready to use comprehensions in your own code.

If you’ve got any questions or other remarks, let me know in the comments.

How exactly do context managers work?

Context managers (PEP 343) are pretty important in Python. You probably use one every time you open a file:

But how well do you understand what’s going on behind the scenes?

Context manager classes

It’s actually quite simple. A context manager is a class that implements an __enter__ and an __exit__ method.

Let’s imagine you want to you print a line of text to the console surrounded with asterisks. Here’s a context manager to do it:

The __exit__ method takes three arguments apart from self. Those arguments contain information about any errors that occurred inside the with block.

You can use asterisks in the same way as any of the built-in context managers:

Accessing the context inside the with block

If you need to get something back and use it inside the with block – such as a file descriptor – you simply return it from __enter__:

myopen works identically to the built-in open:

The contextmanager decorator

Thankfully, you don’t have to implement a class every time. The contextlib package has a contextmanager decorator that you can apply to generators to automatically transform them into context managers:

The code before yield corresponds to __enter__ and the code after yield corresponds to __exit__. A context manager generator should have exactly one yield in it.

It works the same as the class version:

Roll your own contextmanager decorator

The implementation in contextlib is complicated, but it’s not hard to write something that works similarly with the exception of a few edge cases:

It’s not as robust as the real implementation, but it should be understandable. Here are the key points:

  • The inner function instantiates a copy of the nested CMWrapper class with a handle on the generator passed into the decorator.
  • __enter__ calls next() on the generator and returns the yielded value so it can be used in the with block.
  • __exit__ calls next() again and catches the StopIteration exception that the generator throws when it finishes.

That’s it for now. If you want to learn more about context managers, I recommend you take a look at the code for contextlib.

How to throttle a Django view

In the last post, we built a todo list application with Django and Ember. It contained a register view to allow users to create accounts.

Right now, there is nothing to prevent an attacker from flooding that view and filling up the database with fake users. We want to throttle access to the view and limit the damage an attacker can cause.

I investigated several packages that provide throttling functionality, but the cleanest one I found is django-ratelimit. Here is how to use it:

Installing django-ratelimit and setting up the cache

First, we need to install it using pip:

Before we can use it to throttle views, we need to set up a cache for Django. Memcached is the recommended solution.

On Ubuntu 14.04, we can install Memcached and the python-memcached binding that Django needs with two commands:

Then we must add the CACHES dictionary to settings.py.

Check the Django documentation here for more detailed instructions.

Throttling a function-based view

Throttling the register view is as simple as applying a decorator to it.

We are setting the key argument to "ip", which means that we are throttling requests from the same IP address. It is possible to use different keys, but the IP address will work for our purposes.

We are also setting the rate to 10/h, or ten times per hour.

The method argument specifies which HTTP methods are to be throttled. ratelimit.UNSAFE is a shorthand for the list ['DELETE', 'PATCH', 'POST', 'PUT'].

If the limit is exceeded, the server will return 403 Forbidden.

Throttling a class-based view

If we had to throttle a class-based view, we would use the mixin that django-ratelimit provides.

The mixin works in much the same as the decorator, but instead of decorator arguments we are using class attributes prefixed with ratelimit_.

Specifying rates

We can specify rates in the format N/u or N/Mu, where N and M are integers and u is a unit. Here are some examples:

  • 1000/s – One thousand times per second
  • 5/m – Five times per minute
  • 10/h – Ten times per hour
  • 20/d – Twenty times per day
  • 100/5h – One hundred times every five hours

That’s it for now. django-ratelimit is a pretty good solution for throttling normal views. If you are using an API library such as Django Rest Framework, you should check if it has built-in throttling functionality.

Making Ember and Django play nicely together: a to-do app walkthrough

I’ve wanted to get started building applications with Ember for a while, but I never invested the time to figure out how to integrate it with Django. I’ve spent the last few days doing just that, however, and it’s been a nightmare of outdated libraries and vague documentation. The main obstacle was getting authentication working. At one point I ended up on page 7 of Google, so that will give you an idea of how bad it was. I’m writing this post so you don’t have to go through the same pain.

Here is what we are going to build:

  • An Ember todo list CRUD app
  • Using a JSON API-compliant backend built with Django Rest Framework
  • Secured using token authentication
  • With the ability to login and register new users

Here are some screenshots of the (unstyled) finished application:

new_todo todo_list_index user_login user_registration

This tutorial is for Ember 2.4 and Django 1.9. I recommend that you go through the basic Ember tutorial first to orientate yourself, although I will try to explain everything as I go along.

This tutorial is pretty long. Want a PDF?

Just type in your email address and I'll send a PDF version to your inbox.

Powered by ConvertKit

Setting up the project

We’re going to put the Ember project and the Django project side by side in the same directory. We are not going to embed the Ember project within Django’s static files, as some people do. There is really no reason to do that because the whole point of front-end Javascript frameworks is that they are backend-agnostic – they should work equally well with any backend that implements the necessary API calls. It is not even obvious that we would put the Ember client and the Django API on the same server, so it is best to keep them separate.

Create a directory to hold everything and cd into it:

Use the ember new command to generate an Ember application called todo-ember in that directory:

cd into the directory and install some Ember libraries that we are going to need:

Now we will generate the Django project. We will also start a virtualenv for it. From the root directory – the one where we ran ember new – run the following commands:

And we will generate an app inside our project to hold our todo list implementation:

We’re going to need some Django packages to build an API that plays nice with Ember. Install them using pip:

Now add everything to INSTALLED_APPS in settings.py.

We need the django-cors-headers package to add Cross Origin Resource Sharing headers to our API responses. This will allow our Ember development server running on localhost:4200 to talk to the Django development server on localhost:8000. Add the middleware from that package to MIDDLEWARE_CLASSES :

And set the CORS_ORIGIN_ALLOW_ALL setting to True (this will be fine for development, but don’t run it like that in production):

Now add the Django Rest Framework settings to settings.py. Most of this is overriding Django Rest Framework defaults with classes from the djangorestframework-jsonapi , which make Django Rest Framework conform to the JSON API specification that Ember Data expects. We also set TokenAuthentication as a default authentication class:

Implementing the Django TodoItem model

With the basic project skeleton in place, we can create a model for todo items. Open models.py in the todo app and implement the following:

Creating a serializer for TodoItem

Because we are exposing the model with Django Rest Framework, it is going to need a serializer. Make a file called serializers.py in the todo app directory. It should look like this:

Exposing TodoItem as an API

We can take advantage of standard Django Rest Framework functionality to expose REST endpoints for the model. In views.py in the todo app, add the following:

Before we can actually call the endpoints, we need to register the ViewSet in urls.py.

In todo_django/urls.py, register it as follows:

One important point: when instantiating the DefaultRouter, make sure to pass the trailing_slashes=False argument. Otherwise Django will try to redirect calls to /api/todos to /api/todos/, which confuses Ember Data.

As usual when we create a new Django model, we need to migrate:

Now run the Django development server and go to http://localhost:8000/api/todos. You should see the standard Django Rest Framework browseable API screen.

Generating an Ember scaffold with ember-cli-scaffold

We can use the pretty cool ember-cli-scaffold to automatically generate most of what we need to CRUD todo items on the Ember side. From inside the todo-ember directory, run this command:

You should get a printout of everything that was generated. As you can see, it generated a model, routes and templates for us. You can look through the generated files to see what is going on in them.

Something that sucks about the scaffold is that it generated input text components for the boolean attribute on the model. We can change that easily enough.

Open the handlebars template in todos/-form.hbs and find the form component linked to the model’s done component. It should look like this:

Just change it to this:

It also automatically set up a Mirage mock api server to test against, but we are going to use the Django development server, so we don’t need that in our project. Get rid of it like so:

And it generated an adapter in adapters/todos.js that we won’t need, so delete that too.

Adding an application adapter

Ember Data uses adapters to take requests for information from the store and translate them into calls to the persistence layer. We will create an application adapter that Ember Data will use for all API calls by default:

Open up adapters/application.js and add the following:

You’ll notice that we have imported ENV from todo-ember/config/environment.js. This is the Ember equivalent of Django’s settings.py where we can set up different options for when the app is running in dev, test or production.

Let’s set values for ENV.host in there. We’re pointing to the Django development server in development mode:

With that in place, the Ember development server will direct all its API calls to localhost:8000 where your Django development server is running.

Start both servers now and in your browser go to http://localhost:4200/todos. You should find that you can create, read, update and delete records on the server from your Ember frontend. Keep an eye on the Django dev server output to see the requests going through.

Setting up Django for token authentication

So far we’ve built a lot of functionality with nor much code, but we’ve got a problem: anybody can access the server and change our data. Let’s fix that so that only registered users can access the todos route.

We are going to use the token authentication mechanism that comes with Django Rest Framework.

First we will add endpoint in Django that our Ember application can call to receive a token that it will use to validate future API calls. Edit urls.py:

How does this endpoint actually work?

We are going to set up our Ember application to POST some JSON to the endpoint. The JSON will contain a username and password. If the username and password are correct, the endpoint will return a 200 response containing a token.

If the username and password are not correct, it will return a 400 response with an error message. You can try it out with CURL.

First, let’s see what it does with bad credentials:

Now with good credentials:

We will use an Ember addon called Ember Simple Auth to store this token and add it as a header to future API calls.

Setting up Ember Simple Auth

In the Ember project, add the following settings to environment.js. They control the behaviour of Ember Simple Auth:

The code is quite self explanatory, but if you don’t get it right now, relax. You will see what those settings are used for later when we start adding route mixins.

Now generate an application controller (you’ll need to generate a route first):

And add the following code to it:

We are implementing the invalidateSession action on this controller so we can have a logout link on every page. Add the following snippet to the application template in application.hbs:

Creating the login form and controller

Let’s generate an Ember route called login where our login form will live.

While we’re at it, let’s generate a controller for the route:

Open the template for the route – login.hbs – and add a simple login form:

This won’t work until we implement the authenticate action on the controller, so open up controllers/login.js and add the following:

The code in this file deserves some explanation. At the top of the controller, we are injecting the Ember Simple Auth session service, which manages session state for the application.

Then, in the authenticate action, we are calling authenticate on the service. You have probably noticed that the first argument to authenticate is drf-token-authenticator. This refers to a custom Ember Simple Auth authenticator that we have not implemented yet, so let’s do that.

Implementing a custom authenticator

Inside your ember app directory, make a directory called authenticators. Inside that directory, make a file called drf-token-authenticator.js This is where our custom authenticator will live.

Add the following code to the file:

A few words of explanation are required:

We are extending the Ember base authenticator and implementing the restore and authenticate methods.

restore “restores the session from a session data object. This method is invoked by the session either on application startup if session data is restored from the session store.” If we don’t implement this then if we log in and then refresh the page for instance we will be kicked back to the login screen.

authenticate is where we call the /api-auth-token/ endpoint we created in Django. The method returns an Ember Promise that resolves or rejects based on the response to the API call.

The API call itself is made with jQuery, which is embedded into Ember. Notice that we are using the ENV.host that we placed in environment.js earlier to direct the AJAX request to the proper endpoint.

If you run the application now and go to http://localhost:4200/login you should see a login form. Try to log in with an invalid username and password first. You should see the error JSON from the server underneath the form. Then try it with a valid username and password. You should see the “login” link at the top of the page change to a “logout” link.

Securing the API

So far so good, but we’re not actually preventing anyone from sending unauthenticated requests to the API. To do that, we need to edit the ViewSet in Django:

Notice that we added two new properties – authentication_classes and permission_classes.

Right now if you go to http://localhost:4200/todos you will find that our application is broken. Django expects each incoming request on the /todos/ route to have a valid Authorization header with the value Token my_token_value. In order to automatically add that header to outgoing requests, we need to write a custom authorizer.

Writing a custom authorizer

Authorizers are components of Ember Simple Auth that add authorization information to requests made by Ember.

Make a directory inside your app called authorizers and make a file inside that directory called drf-token-authorizer.js. Add the code below to that file:

We are extending the base authorizer from Ember Simple Auth and implementing authorize on it. This checks that the session is authenticated, grabs the token from the passed in sessionData variable, and adds it as a header by calling the passed in block with the header name and the header value.

To make sure that the header is added to all API calls that Ember makes, we need to modify the application adapter. Open adapters/application.js and change it from what we had earlier:

You can see here that we are importing the DataAdapterMixin from Ember Simple Auth and adding it to the adapter as a mixin. We also need to set up authorizer to point to our custom authorizer.

At this point, if you try your application again you should be able to successfully view, edit, create and delete todo list items.

Adding Ember Simple Auth route mixins

A problem with our application right now is that it does not prevent you from accessing the todos route without logging in (although after the last change the API call to fetch the todo items won’t work). We would like users who go straight to todos without logging in to be sent instead to the login route.

It doesn’t automatically send you to todos after login either.

And it would be great if users who go to login while already logged were sent back to todos.

It is quite simple to achieve these things using route mixins. The first one we need to add is in the application route.

Open routes/application.js and change it so it looks like this:

This will enable Ember Simple Auth to automatically change the route based on the authentication state.

Now we need to protect the todos route so that it cannot be accessed without being logged in. Open each route in routes/todos/ and add the AuthenticatedRouteMixin. For example, routes/todos/edit.js should look like this:

Do the same for the nested index and new routes.

If you go to http://localhost:4200/todos now without being logged in you will be kicked back to the login page. Where you go depends on the value for authenticationRoute that we added to environment.js earlier. routeAfterAuthentication and routeIfAlreadyAuthenticated control where you do when you log in and when you come back to the site after already being logged in.

The last mixin we need to add is the UnauthenticatedRouteMixin in the login route. It prevents authenticated users from seeing the login page. Here is what your routes/login.js should look like after you add it.

Registering users

At the moment all we can do is log in with existing users. Nobody else can register an account. Let’s fix that.

We’re going to need a registration endpoint in Django that Ember can use. Here is a one I pulled out of an old project. It’s not going to win any beauty contests, but it does the trick:

It relies on a user registration form that takes a username, email address, a password and a copy of the password to make sure they are the same. Here it is (it lives in forms.py):

This is all pretty standard Django stuff, but note the use of JsonResponse and the use of the HTTP status code to signal the outcome of the request.

The view has to be added to urls.py, naturally:

Now that the Django side is ready, let’s take care of the Ember side. We have to generate a register route and controller.

Then we can put a registration form in register.hbs.

Modify the autogenerated register route to include the UnauthenticatedRouteMixin:

Now we will implement the register action on the register controller that the registration form uses. It looks like this:

This follows the same pattern as the login form. We grab the values in the form fields then use JSON.stringify to assemble them into a JSON payload that we POST to the server. The server then returns either 201 Created response or a 400 response with information about what went wrong.

If the success callback is fired, we hide the login form and show the “Signup Complete” message. Otherwise, we write the error message below the form.

In the same was as with the login form, we are just writing the error responses in raw JSON below the registration form. Exactly how you alert the user to errors will depend on your CSS framework and the facilities it provides for, e.g. form element highlighting, etc. In any case, it is a trivial matter to display the errors more nicely on the page, so we won’t waste time with that today.

Now you should be able to register new users and log in with them.

Make a link to the registration form by editing the application.hbs template. Change the line with the “login” link to include a “register” link too:

Linking Todo items to users

Right now when users log in they see all the todo items and not just the ones that belong to them. Let’s fix that so that todo items are linked to users and users can only see and edit their own.

Add a foreign key to the TodoItem model that points to the user:

Ass with all model changes, we need to migrate. When we run makemigrations we will be prompted to specify a default value for the user field on existing rows. Just give it the primary key of an existing user:

Then apply the generated migration:

To restrict the todo items to the current user, override the get_queryset method on the ViewSet:

And remove the queryset:

At this point we need to add the base_name to the router to avoid errors:

When new todo items are created, we want them to be linked to the current user. We can achieve that by overriding perform_create on the ViewSet:

We also want object-level permissions that prevent people from directly accessing or modifying other users’ todo items, so we will implement a custom Django Rest Framework permission class.

You can put the following code anywhere, but I like to put it in a file called permissions.py in the todo app:

This class implements has_object_permission(self, request, view, obj) from the base class and performs a simple check to see if the user on the object is the same as the authenticated user.

Now change permission_classes on the ViewSet to apply this permission:

That’s it! At this point if you register a bunch of users they will all have their own todo items.

The end

Phew! This has been a pretty long post, but if you have followed along you have gotten over the biggest initial hurdles of working with Ember and Django.

Check out the resources in the next section to learn more.

Resources

The Ember Quickstart Guide should be the first destination in your Ember journey.

The Ember Simple Auth documentation will help you get to grips with this indispensable Ember library.

The Django Rest Framework JSON API package makes linking Ember Data and Django Rest Framework pretty seamless.

Built With Ember showcases the sophisticated user experience that can be achieved with this cool framework.

How to automatically lint your Python code on commit

Increasing code quality is a constant battle for all developers so it makes sense to use every tool available.

Ian Cordasco’s flake8 is pretty much the standard in Python linting at the moment. It wraps three libraries: pyflakes (a static analyzer/linter), pep8 (a PEP8 checker) and McCabe (a cyclomatic complexity checker).

Running it against a project is dead simple. Just install it, go to your project root, and use the flake8 command:

If you want, you can run it yourself each time, but it is better to set up a pre-commit hook so that it runs every time you try to check in code and blocks the commit if it detects any quality problems.

There is a way to automatically install the hook (flake8 --install-hook), but I have found it unreliable, so I just manually add flake8 to the .git/hooks/pre-commit script. Here’s what my script looks like:

Remember that this script needs to be executable.

After you add that hook, your commits will be blocked if flake8 identifies any issues. The problem is that out of the box it is quite sensitive, so you might want to allow certain things to be committed that it thinks are problematic. There are two ways to do that.

For single lines that cause problems (because they are too long and don’t pass PEP8, for instance), you can add an ignore directive in a comment, like so:

Or if you want to ignore a certain type of error for the entire project, or exclude certain subdirectories, you can add a .flake8 config file and set up ignores and excludes in there. Here’s an example:

The ignore property takes a comma-separated list of error codes that should be ignored. In this case, E501 is the “line too long” error. The exclude property takes a comma-separated list of files and locations not to check. max-complexity specifies the maximum cyclomatic complexity allowed for functions, as determined by the McCabe library. The default value is 10, but I find that very conservative so I like to set it to 16.

That should be more than enough to get started, but if you want to learn more, you can check out the flake8 docs here.

Why Is My Django/MySQL Application Showing Unicode as Question Marks?

Back up your database before you try anything here. Sometimes character set conversions can change your data in ways you don’t want. Be sensible and use mysqldump or something to safeguard it before you start messing around. Needless to say, you should try everything in a test environment first.

When you run a Django application (or any other web application, for that matter) on top of a stock MySQL install, you might hit a problem with storing Unicode characters. I saw it in a Django project that had to deal with Arabic text. Instead of the Arabic characters, it just showed a bunch of question marks.

Here’s how to fix it.

Check your MySQL character set

Out of the box, your MySQL character set is probably latin1 . We’re going to change it to utf8 .

First, run this command to check that you are in fact dealing with an incorrect character set:

In the output, you will probably see the following line:

If you do, keep going. We’re going to sort it out.

Edit my.cnf

The main MySQL configuration file is called my.cnf . On Ubuntu it is located at /etc/mysql/my.conf  . You can check where it is on your own system by running locate my.cnf .

The file is divided into sections and the start of each section is indicated with a name in square brackets. We’re interested in the sections [client]  and [mysqld] .

After making a backup of the current state of the file

open it in your text editor of choice and find the [client]  section. Add the following line to it:

Next, find the [mysqld]  section and add the following three lines to it:

Be careful that you add the code to the right sections. If you make a mistake here then MySQL will not start and it won’t write any useful error message to the logs.

Save my.cnf  and restart MySQL. On many systems, you can do this with the service  command:

Alter each table to use the new character set

First, you want to generate the script you are going to use to convert each table one by one to the new character set. Change the database name, username and password to the correct values and run this in the terminal.

It will generate the SQL you need to change each of your tables. For example, if your database contained three tables called users , comments  and posts , the generated code would look this this:

Run that code against the database using your tool of choice. It might take a while, depending on the size of your tables. You’ll know when you try it on your test environment. When it’s done, those question marks should be history.

How variable scope works in Python

Someone asked me to take a look at a piece of code recently and tell him why it wasn’t working. The problem was that he didn’t really understand Python variable scoping. That’s what I’m going to talk about today. It is quite basic, but you really need to have it down cold, and there are a few surprises in there too.

What you need to know

A variable in Python is defined when you assign something to it. You don’t declare it beforehand, like you can in C. You just start using it.

Any variable you declare at the top level of a file or module is in global scope. You can access it inside functions.

Before I go on I need to add a disclaimer: global variable are almost always a bad idea. Yes, sometimes you need them, but you almost always don’t. A good rule of thumb is that a variable should have the narrowest scope it needs to do its job. There’s a good discussion of global variables and the associated issues here.

Modifying the value of a global variable is less simple. Take a look at this example.

What happened? Why is the value of x 123 for the second print statement? It turns out that when we assigned the value 321 to x inside foo we actually declared a new variable called x in the local scope of that function. That x has absolutely no relation to the x in global scope. When the function ends, that variable with the value of 321 does not exist anymore.

To get the desired effect, we have to use the global keyword.

That’s more like it.

There is one more scope we have to worry about: the enclosing scope created by declaring one function inside another one. Watch.

What if you want to modify the value of x declared in the outer function? You’ll run into the same problem that made us use global. But we don’t want to use global here. x is not a global variable. It is in the local scope of a function.

Python 3 introduced the nonlocal keyword for this exact situation. I wrote a post about it on this page, but I’ll show you a quick example now.

A simple way to remember Python scoping rules

In the book Learning Python by Mark Lutz, he suggests the following mnenomic for remember how Python scoping works: LEGB

Going from the narrowest scope to the widest scope:

  • L stands for “Local”. It refers to variables that are defined in the local scope of functions.
  • E stands for “Enclosing”. It refers to variables defined in the local scope of functions wrapping other functions.
  • G stands for “Global”. These are the variables defined at the top level of files and modules.
  • B stands for “Built in”. These are the names that are loaded into scope when the interpreter starts up. You can look at them here: https://docs.python.org/3.5/library/functions.html

And that is everything you need to learn about this topic for the vast majority of Python programming tasks.

How to fix database race conditions in Django views

Today I’m going to show you how to fix an extremely common error in Django applications. My guess is about 90% of Django applications deployed in the wild suffer from this error, and like 72% of statistics I just made that one up on the spot. Seriously though, it’s pretty common.

Imagine you’ve got an online bookstore application with a Book  model that has a quantity attribute. When somebody buys a copy of one of your books, you want to decrease the quantity attribute by 1. Here is the naive way to do it:

At the start when you’ve got a small load on your system, this will seem to work fine. Now imagine your bookstore grows, you open some new branches, and there are multiple updates being run on your application every second. That’s when strange things will start to happen. Here is how two concurrent updates might play out with our current code. book1 represents the first concurrent update and book2 represents the second:

At the start of both concurrent updates, an identical copy of the data in the database is loaded into memory. The inventory quantity is decreased on each copy, then the new quantity is written back to the database, with the second update clobbering the first. Result: it is as if one of the updates never happened.

In database terms, what we need is called a SELECT FOR UPDATE. Basically, this locks the row in the database until the new information is written back, preventing a second instance from reading and modifying data that might be in the process of changing.

Since Django 1.4, implementing SELECT FOR UPDATE through the ORM is really simple:

That will lock the row selected with get until the end of the transaction block, which since Django 1.5 corresponds to the end of the request by default.

select_for_update is compatible with the postgresql_psycopg2, oracle, and mysql database backends. It doesn’t work for the sqlite backend.

Text to speech with Python 3 on Linux and OSX

Recently I had a requirement to synthesise speech from text on two different operating systems. Here is what I came  up with.

OSX

Synthesising speech is a simple matter for OSX users because the operating system comes with the say  command. We can use subprocess  to call it.

Linux

On Linux, there are a few different options. I like to use the espeak  Python bindings when I can. You can install it on Ubuntu using apt-get .

Then use it like so:

espeak  supports multiple languages, so if you are not dealing with English text, you need to pass in the language code. Unfortunately, it looks like the Python bindings don’t support that yet, but we can still use subprocess  like we did on linux.

The list of available languages can be found on the espeak website here.