How to integrate New Relic with Django, Apache and mod_wsgi

I just finished setting up New Relic application monitoring on Recommendify. It was a little bit of a painful process and it wasn’t properly described in their docs, so I’m going to note the steps I went through here. Hopefully it will help somebody else in the same situation.

Install the New Relic agent

First, install the newrelic package using pip.

Generate the New Relic configuration

Then copy the command to generate the configuration file from your New Relic dashboard. It will look something like this:

LICENSE-KEY will be the specific license key for your account.

This command will create a file called newrelic.ini. Copy it somewhere safe on your server and make sure it has permissions to be read by the user Apache runs under.

Edit wsgi.py

The next step took some time to figure out. New Relic provides a wrapper script for Python applications, but it doesn’t work for setups using embedded interpreters, such as Apache with mod_wsgi. As this is how Recommendify is configured, I had to find another way.

I edited the wsgi.py file from my Django project to wrap the WSGI application object.

This script imports the socket module and used the socket.gethostname() function to get the hostname of the current machine. I did this so that New Relic would only log data in production and not during development.

At the bottom of the script, I check if the application is running on the production environment. If it is, I import newrelic.agent, initialize it by passing it the path to the newrelic.ini created during the previous step, then wrap the WSGI application object using the WSGIApplicationWrapper class.

You will need to use your own Django settings module, hostname string, and path to newrelic.ini, of course.

Once you deploy the modified wsgi.py, you should start seeing monitoring information in New Relic after about five minutes. Here’s what it looks like for me.

new_relic

What to do after Codecademy

Today I want to offer my opinion on a question that lots of people have asked me since I started this blog. “What should I do after I have finished an introductory course on Codecademy?”

Codecademy is basic stuff

You should be proud to have finished your first course and dipped your toes into programming. It’s a great achievement, but you also have to be realistic: this is just the beginning. You have made your way to base camp, but you haven’t yet attempted to reach the summit.

The fact is that Codecademy, while providing a solid grounding in the basics, doesn’t go much farther than that. On its own, it only teaches you a fraction of the things that you need to know to program professionally (if that’s your goal).

So be proud, but be humble. There is so much to learn that you will never know everything. And that’s a good thing, as long as you like learning.

Identify your goals

Maybe it’s obvious, but the path you take from now should be dictated by what you want to be able to do. If your goal is to write web applications, you should get to grips with frameworks such as Django, Flask and Bottle. If your goal is to use Python for data science, you should be learning things like Numpy, Scipy and Pandas (and hitting the math books hard). If you want to create desktop software, you should learn PyQt and Tkinter.

Once you have to make the decision about what area (or areas) to focus on, then go and find some beginner materials for them. I’m not going to list any here, because there are just so many, but rest assured that they are easy to find.

(That reminds me of another point: Google is your friend. Programmers like to joke that the job is 80% Googling, and there is some truth to that, depending on what you’re building. So you’d better improve your Google-fu and learn how to ask good questions.)

Projects are key

Working through courses and tutorials, and reading technical books, are important ways of improving your knowledge, but there is really no substitute for trying to build something and picking up what you need to know as you go along. The skills of identifying bugs quickly, deciphering arcane error messages and knowing when to stop fiddling with your code and move on are as important to programming as knowledge of advanced language features or algorithms.

With that in mind, I suggest you set yourself a project. Maybe you already have something in mind, in which case, awesome! Or maybe you don’t. In that case, take a look at this list of projects and pick one that seems fun and achievable.

http://www.dreamincode.net/forums/topic/78802-martyr2s-mega-project-ideas-list/

Learn some real computer science

Assuming you’re like a lot of the people picking up coding these days and your goal is to make web applications, then you can go a long way without even thinking about the theoretical underpinnings of programming.

Don’t be that guy. Apart from the often mooted practical considerations of writing efficient code or gaining access to higher paying jobs, there is a world of elegance and beauty in programming that you may not have expected. I am going to suggest a few resources for this because people have remarked that they don’t know where to start.

Code: The Hidden Language of Computer Hardware and Software

This is the amuse-bouche. Just read it one straight through for fun and try to become inspired!

NAND2Tetris

I can’t say enough good things about this course. It took me from a vague understanding of how computers executed my code to being able to conceptualize from the ground up how it all works. If you follow the course the whole way through you will build a (simulated) computer from first principles, create an assembler for it, a compiler for a Java-like language, and a basic operating system. Do it!

Coursera – Algorithms Part 1

This course, taught by Robert Sedgewick, will teach you a basic algorithmic toolbox. The algorithms is deals with are sort of like a “Greatest Hits” collection. Knowledge of these algorithms and design techniques behind forms a kind of lingua franca among programmers. For that reason, they are also really popular interview questions.

Phew! If you get to grips with all that stuff, you’ll be doing well!

Learn source control

I often see people advising beginners to learn Git before they do anything else. Git is a great program, and version control is one of the key skills needed to develop software at a professional level, but I can’t agree with the sentiment. The fact is that Git is extremely complicated. Even experienced programmers end up having to trawl the documentation and StackOverflow to figure out how to do certain things. So I suggest you learn Mercurial instead. It has many of the advantages of Git while being much more user-friendly. As a bonus, it’s written in Python!

They key thing you need to acquire at this stage of your education is a habit of using version control to track the development of your personal projects. You need to go through the process of changing code, breaking your whole project and being able to revert to a working state so that you can really understand why people use version control. For that purpose, the steep learning curve of Git is just going to put you off.

I’m not saying you shouldn’t learn Git later. You should, especially if you want to participate in open source projects. But at that point knowing Mercurial is only going to help, because you will already understand the concepts.

Learn an IDE and/or a text editor

Now is also the time to learn about IDEs and text editors. If you’ve been hanging around on programming forums (you should be), you will no doubt have got an inkling of the holy war between Vi and Emacs users, and the strong opinions of people who think IDEs are unnecessary bloatware. I don’t want to come down on either side of these arguments. I just want to suggest that you are now at the point where you should be investigating them for yourself.

But for my money, PyCharm is an amazing piece of software. 😀

The end

I hope I’ve given you something to think about with this article. If it all seems like a lot of work, well, it is! I’ll let Peter Norvig provide some perspective.

As always, I’m happy to answer your emails.

Comparing files in Python using difflib

Everybody knows about the diff command in Linux, but not everybody knows that the Python standard library contains a module that implements the same algorithm.

A basic diff utility

First, let’s see what a minimal diff implementation using difflib might look like:

The context_diff function takes two sequences of strings – here provided by readlines – and optional fromfile and tofile keyword arguments, and returns a generator that yields strings in the “context diff” format, which is a way of showing changes plus a few neighbouring lines for context.

The library also supports other diff formats, such as ndiff.

Let’s use the utility to compare two versions of F. Scott Fitzgerald’s famous conclusion to The Great Gatsby.

The exclamation marks (!) denote the lines with changes on them. file1.txt is of course the version we know and love.

Fuzzy matches

That’s not all difflib can do. It also lets you check for “close enough” matches between text sequences.

When I saw this first, I immediately thought “Levenshtein Distance”, but it actually uses a different algorithm. Here’s what the documentation says about it:

The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980’s by Ratcliff and Obershelp under the hyperbolic name “gestalt pattern matching”. The basic idea is to find the longest contiguous matching subsequence that contains no “junk” elements (R-O doesn’t address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.

HTML diffs

The module includes a class called HtmlDiff that can be used to generate diff tables for files. This would be useful, for instance, for building a front end to a code review tool. This is the coolest thing in the module, in my opinion.

The class also has a method called make_file that outputs an entire HTML file, not just the table.

Here is what the rendered table looks like:

difflib_html

Go forth and diff!

There are a few other subtleties, but I have covered the main functionality in this post. Check out the official documentation for difflib here.

The bool function in Python

Python’s built-in bool  function comes in pretty handy for checking the truth and falsity of different types of values.

First, let’s take a look at how True and False are represented in the language.

True and False are numeric values

In Python internals, True is stored as a 1 and False is stored as a 0. They can be used in numeric expressions, like so:

They can even be compared to their internal representation successfully.

However, this is just a numeric comparison, not a check of truthiness, so the following comparison returns False:

bool to the rescue

The number 5 would normally be considered to be a truthy value. To get at its inherent truthiness, we can run it through the bool function.

The following are always considered false:

  • None
  • False
  • Any numeric zero: 0, 0.0, 0j
  • Empty sequences: "", (), []
  • Empty dictionaries: {}
  • Classes with __bool__() or __len__() functions that return False or 0.

Everything else is considered true.

Python Regular Expression Basics

Regular expressions is one of those topics that confuse even advanced programmers. Many people have programmed professionally for years without getting to grips with them. Too often, people just copy and paste Regexes from StackOverflow or other websites without really understanding what’s going on. In this article, I’m going to explain regular expressions from scratch and introduce you to Python’s implementation of them in the re  module.

Regular expressions describe sets of strings

A regular expression is a description of a set of strings. Regular expression matching is a method of finding out if a given string is in the set defined by a certain regular expression. Regular expression search is a method of finding occurrences of strings belonging to that set inside a larger string. Python’s re module provides facilities for search, matching and replacing matched substrings with something else.

The simplest regular expression is just a sequence of ordinary characters. Ordinary characters are those characters that do not have a special meaning in the regular expression syntax.

The re.match function returns a match object if the text matches the pattern. Otherwise it returns None.

Notice how I put the r prefix before the pattern string. Sometimes the regular expression syntax involves backslash-escaped characters that coincide with escape sequences. To prevent those portions of the string from being interpreted as escape sequences, we use the raw r prefix. We don’t actually need it for this pattern, but I always put it in for consistency.

To search in a larger string, we use re.search.

So far this is not very useful. We are just matching strings against other strings, which can be achieved more easily with == and in.

However, regular expressions really come into their own when when we start using sets of characters and repetitions.

Sets of characters and repetitions

Let’s say we don’t just want to match the string "cheese", but any string of lowercase alphabetic characters that is six characters long. In that case we can use the pattern "[a-z]{6}". The bit in the square brackets – [a-z] – means that we should match any lowercase alphabetic character from a to z. The bit in the curly brackets – {6} – means that the match should repeat six times.

The dot character . matches any character except a newline.

By the way, if you want to match the dot character itself, you will have to escape it. Special characters can be escaped and made to match their ordinary equivalents by putting a backslash \ before them.

Any other restricted set of characters can be defined, such as the set of all digits – [0-9] – and the set of all alphanumeric characters – [a-zA-Z0-9]. There are some shorthand ways of specifying common sets of characters too. For instance, \w is equivalent to the set [a-zA-Z0-9_], i.e. the set of every alphanumeric character and the underscore.

For a full list of the special character classes supported by re, use the help function.

* and +

So far we have learned how to match a set of characters a specific number of times, but what if we want to match it an indeterminate number of times? That’s where * and . come in.

* is known as the Kleene star, after Stephen Kleene, who invented this notation to describe finite state automata. It means “match the previous character or set of characters zero or more times”.

For instance, the regex a*  matches the following strings:

  • ""
  • "a"
  • "aa"
  • "aaa"
  • etc.

The + character, on the other hand, means “match the previous character or set of characters one or more times”.

So, the regex b+  matches the following strings.

  • "b"
  • "bb"
  • "bbb"
  • etc.

Unlike the previous pattern, this one does not match the empty string.

Repeating matches between x and y times

We’ve seen how to match characters either a definite number of times or an unlimited number of times, but we can also restrict the length of the match using the {x, y} syntax, where x is the lower limit and y is the upper limit.

The pattern a{3,5} will match strings composed of the character a repeated between three and five times.

The strings below match the pattern:

  • "aaa"
  • "aaaa"
  • "aaaaa"

However, the strings "aa"  and "aaaaaa"  do not match.

Excluding characters

Until now, we’ve been defining sets of character by the characters that are included in them, but we can also define sets of excluded characters. That is done using the caret ^ character _inside_ the square brackets.

In the above example, the pattern [^abc]+ matches any string of length one or more that does not contain the characters a, b or c.

Matching the start and end of strings

Regular expressions also support another feature called “anchors”. The caret ^ and the dollar sign $ are the two most common anchors, used to match the start and end of strings respectively. This feature is relevant for searching within strings rather than matching the whole string.

Consider the following example:

The first pattern – ^cheese  – matches the first occurrence of the substring "cheese"  within the search string. The second pattern – cheese$  – matches the second occurrence.

By using the two anchors together, we can match the whole string. Here is a pattern that will match any string starting and ending with "cheese".

Matching this or that

Sometimes we want to build a pattern that says “match this string, or match that string”. For that, we use the pipe | character.

We can confine it to a certain region using round brackets.

Optional items

What if we wanted to match, say, both the American and English spellings of the word “harbour”. The American version has no “u” in it. Here’s when the optional character ? comes in useful.

The ? in the pattern matches the preceding character u zero or more times.

Groups and named groups

Parts of a regex pattern bounded by round brackets are called “groups”.

These groups are numbered and can be accessed using indexes, but it is also possible to create named groups. These are accessible by name rather just by an index.

Greedy and non-greedy matching

The normal way for regex searches to work is greedily, i.e. matching as much of the search string as possible. Here’s an example.

The pattern <.*>  matched the whole string, right up to the second occurrence of > . However, if we only wanted to match the first <h1>  tag, then we can use the greedy qualifier  *?  that matches as little text as possible.

Now we’re only matching the first tag.

The end

We certainly haven’t covered everything there is to know about regular expressions in this post, but we’ve covered enough to decipher that vast majority of patterns found in the wild, and to invent our own without falling back on cargo-cult copying and pasting.

However, there’s no need to reinvent the wheel, so if you find a good regex that does what you need, you may as well swipe it. Before you do though, test it out with a tool like Regexr or similar.

Python descriptors made simple

Descriptors, introduced in Python 2.2, provide a way to add managed attributes to objects. They are not used much in everyday programming, but it’s important to learn them to understand a lot of the “magic” that happens in the standard library and third-party packages.

The problem

Imagine we are running a bookshop with an inventory management system written in Python. The system contains a class called Book  that captures the author, title and price of physical books.

Our simple Book class works fine for a while, but eventually bad data starts to creep into the system. The system is full of books with negative prices or prices that are too high because of data entry errors. We decide that we want to limit book prices to values between 0 and 100. In addition, the system contains a Magazine class that suffers from the same problem, so we want our solution to be easily reusable.

This tutorial is pretty long. Want a PDF?

Just type in your email address and I'll send a PDF version to your inbox.

Powered by ConvertKit

The descriptor protocol

The descriptor protocol is simply a set of methods a class must implement to qualify as a descriptor. There are three of them:

  • __get__(self, instance, owner)
  • __set__(self, instance, value)
  • __delete__(self, instance)

__get__ accesses a value stored in the object and returns it.

__set__ sets a value stored in the object and returns nothing.

__delete__ deletes a value stored in the object and returns nothing.

Using these methods, we can write a descriptor called Price that limits the value stored in it to between 0 and 100.

A few details in the implementation of Price deserve mentioning.

An instance of a descriptor must be added to a class as a class attribute, not as an instance attribute. Therefore, to store different data for each instance, the descriptor needs to maintain a dictionary that maps instances to instance-specific values. In the implementation of Price, that dictionary is self.values.

A normal Python dictionary stores references to objects it uses as keys. Those references by themselves are enough to prevent the object from being garbage collected. To prevent Book instances from hanging around after we are finished with them, we use the WeakKeyDictionary from the weakref standard module. Once the last strong reference to the instance passes away, the associated key-value pair will be discarded.

Using descriptors

As we saw in the last section, descriptors are linked to classes, not to instances, so to add a descriptor to the Book class, we must add it as a class variable.

The price constraint for books is now enforced.

How descriptors are accessed

So far we’ve managed to implement a working descriptor that manages the price attribute on our Book class, but how it works might not be clear. It all feels a bit too magical, but not to worry. It turns out that descriptor access is quite simple:

  • When we try to evaluate b.price and retrieve the value, Python recognizes that price is a descriptor and calls Book.price.__get__.
  • When we try to change the value of the price attribute, e.g. b.price = 23 , Python again recognizes that price is a descriptor and substitutes the assignment with a call to Book.price.__set__.
  • And when we try to delete the price attribute stored against an instance of Book, Python automatically interprets that as a call to Book.price.__delete__.

The number 1 descriptor gotcha

Unless we fully understand the fact that descriptors are linked to classes and not to instances, and therefore need to maintain their own mapping of instances to instance-specific values, we might be tempted to write the Price descriptor as follows:

But once we start instantiating multiple Book instances, we’re going to have a problem.

The key is to understand that there is only one instance of Price for Book, so every time the value in the descriptor is changed, it changes for all instances. That behaviour in itself is useful for creating managed class attributes, but it is not what we want in this case. To store separate instance-specific values, we need to use the WeakRefDictionary.

The property built-in function

Another way of building descriptors is to use the property built-in function. Here is the function signature:

fget, fset and fdel are methods to get, set and delete attributes, respectively. doc is a docstring.

Instead of defining a single class-level descriptor object that manages instance-specific values, property works by combining instance methods from the class. Here is a simple example of a Publisher class from our inventory system with a managed name property. Each method passed into property has a print statement to illustrate when it is called.

If we make an instance of Publisher and access the name attribute, we can see the appropriate methods being called.

That’s it for this basic introduction to descriptors. If you want a challenge, take what you have learned and try to reimplement the @property decorator. There is enough information in this post to allow you to figure it out.

A quick guide to nonlocal in Python 3

Python 3 introduced the nonlocal  keyword that allows you to assign to variables in an outer, but non-global, scope. An example will illustrate what I mean.

msg  is declared in the outside function and assigned the value "Outside!". Then, in the inside function, the value "Inside!" is assigned to it. When we run outside, msg has the value "Inside!" in the inside function, but retains the old value in the outside function.

We see this behaviour because Python hasn’t actually assigned to the existing msg variable, but has created a new variable called msg in the local scope of inside that shadows the name of the variable in the outer scope.

Preventing that behaviour is where the nonlocal keyword comes in.

Now, by adding nonlocal msg to the top of inside, Python knows that when it sees an assignment to msg, it should assign to the variable from the outer scope instead of declaring a new variable that shadows its name.

The usage of nonlocal is very similar to that of global, except that the former is used for variables in outer function scopes and the latter is used for variable in the global scope.

Some confusion might arise about when nonlocal should be used. Take the following function, for instance.

It would be reasonable to expect that without using nonlocal the insertion of the "inside": 2 key-value pair in the dictionary would not be reflected in outside. Reasonable, but incorrect, because the dictionary insertion is not an assignment, but a method call. In fact, inserting a key-value pair into a dictionary is equivalent to calling the __setitem__ method on the dictionary object.

I will leave it there for now. If you want to learn more about the nonlocal keyword, check out PEP 3104.

The two ways to sort a list in Python

Today I’m going to take a look at another element of the language that tends to trip up Python beginners – the difference between sorted(my_list)  and my_list.sort().

The built-in function sorted sorts the list that is passed into it, and returns a new list while preserving the old one.

On the other hand, the sort method on list objects sorts the list in place, destroying the original ordering.

Using a list’s sort method is the equivalent assigning the output of sorted back to the original list.

However, that particular way of doing things is frowned upon. Only use sorted

sorted and list.sort both accept the key and reverse parameters. The cmp parameter, which allowed you to pass in a custom comparator function, has been removed in Python 3. key should be used instead.

The difference between range and xrange in Python

Today I’m going to take a look at another difference between Python 2 and 3 that can trip up people making the switch. Python 2 used to have two functions that could be used to iterate a certain number of times in for  loops, range  and xrange . In Python 3, there is no xrange , but the range  function behaves like xrange  in Python 2.

The way things were

You probably remember that in Python 2 you could generate indexes in for  loops in two ways:

The difference between these two built in functions is not immediately obvious when used in this way. Let’s take a look at the output of each function in the interactive interpreter.

As you can see, range  returns a normal list , but xrange  returns an xrange  object. An xrange  object is similar to a generator: it produces the necessary index on demand instead of producing the entire list up front. Therefore it can be slightly faster and more memory efficient. According to the Python 2 documentation, the xrange  type offers the following guarantee:

The advantage of the xrange type is that an xrange object will always take the same amount of memory, no matter the size of the range it represents.

xrange deprecated in Python 3

In Python 3, xrange  has been removed and the only option for generating iterable sequences of consecutive numbers is range . Actually, it is more correct to say that the Python 2 range  function has been removed and xrange  has been renamed to range .

For the most part, this change is easy to handle: just use range  when you would have used either range  or xrange  in Python 2. The only place you might be tripped up is if you actually need the list  that range  used to return. Luckily, all you have to do in that case is pass the Python 3 range  object to the list  constructor function.

Pythonic iteration

Before I finish, I’ll just mention a way to make your code more Pythonic. Quite frequently, when people want to use any of the range functions, it is because they want a way to index another sequence type, e.g.

Sometimes people even declare a counter variable outside the loop, just so they have an index.

There is no need to do either of these things. In particular, that range(len(seq))  idiom is one of classic markers of amateur Python code. What you really need is the enumerate  function, which automatically generates an index for whatever sequence you are iterating over.

Ta-da! Once you start using enumerate , you’ll never go back.

Private methods and attributes in Python

Unlike Java, which enforces access restrictions on methods and attributes, Python takes the view that we are all adults and should be allowed to use the code as we see fit. Nevertheless, the language provides a few facilities to indicate which methods and attributes are public and which are private, and some ways to dissuade people from accessing and using private things.

Normal attribute access

Let’s take a look at how normal attribute access works.

As we can see, there are no restrictions on accessing or assigning to the bar  attribute of our instance. The attribute is also included in __dict__ .

Making it private

Now let’s make bar  “private”. We can do that by adding two leading underscores to the name.

What has happened here is that the name of __bar  has been changed by the  interpreter so that it is not easily accessible outside the class. If we take a look at __dict__  again, we will see that it has been renamed to _Foo__bar , and can be accessed and assigned using that name.

This is called “name mangling”. Attributes whose names start with two underscores are renamed in the format _classname__attrname .

We only have to use the mangled name outside the class. Inside, we access the attribute in the normal way.

Getters and setters

After learning about “private” attributes, sometimes new Python programmers get the idea that they can use getters and setters to manage accessing and assigning attributes, so they write something like this.

It might work, but it’s not Python. Direct attribute access is the natural and Pythonic way to do things, so any solution to mediated attribute access should maintain that interface. There are a few ways to do it, such as overriding __getattr__  and __setattr__ , but the best way is to use managed attributes.

Here we have created a managed bar  attribute that stores its data in the private __bar  attribute. When getting and setting the value of __bar , we can run whatever code we want for validation, logging, etc., provided we go through the interface provided by the two decorated bar  functions. Useful, eh?

Private methods

Methods can be made private in the same way, by naming them with two leading underscores and no trailing underscores.

And just like private attributes, they are accessible by name inside the class.

A word about single underscores

So far we have dealt with names that start with two underscores, but it’s quite common to see names that start with a single underscore. They are not private in the same sense. Name mangling does not occur. A single underscore is mostly just a weak indication that the thing in question is meant to be used internally and is not part of the public interface of the class, module, etc., that it is inside.

In classes, attributes and methods that start with a single underscore are treated normally.

However, single underscores are not purely a stylistic thing. They do affect how the import  statement works.

PEP8 says:

_single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.

This means that if we have a function called _hello_world  in a module called helloworld , and we import *  from it, then the _hello_world  function will not be pulled into the current scope.

It is possible to override the default hiding of objects with single leading underscores. __all__  is a list of the names of public objects exported by a module. If we add '_hello_world'  to the list, then it will be pulled in with the wildcard import.

The single underscore only affects wildcard imports, which we should avoid anyway. We can still grab the function specifically using from helloworld import _hello_world .

And that’s pretty much all you need to know about private attributes in Python!