How to write a Reddit bot in Python

Something I have seen a lot of interest in is writing bots to interact with Reddit and provide useful services to the community. In this post, I’m going to show you how to build one.

Introducing BitesizeNewsBot

The bot we’re going to write is called BitesizeNewsBot. It sits on the “new” queue in the /r/worldnews subreddit and posts summaries of the articles that people link to.

I ran it for a few days last week and, after I worked out some of the kinks, it was ticking along nicely.

The code

Here is the full code for the bot. It’s under 80 lines! Take a minute to read it and then we will step through what it is doing.

Connecting to Reddit using PRAW

Due to some heroic open source contributions, the Python Reddit API Wrapper is a really mature and stable library that gives you access to everything in the Reddit API. The library even has its own subreddit – /r/praw . We’re going to use it to login using the bot’s account, periodically fetch the new submissions in /r/worldnews, and post comments on the submissions that contain compact summaries of the linked articles.

Here’s how we log in with PRAW:

Then we enter a loop of fetching the new submissions from /r/worldnews, summarizing them, and posting them back as comments. On each iteration, we sleep for ten minutes to be a good citizen of Reddit.

This line fetches the ten newest submissions:

When we get the submissions, we can iterate over them and prepare the summaries.

Summarizing text using PyTLDR

Automatic text summarization is a topic I am really interested in. I’ve implemented several summarization algorithms, but the point of this post is to show you how to make a bot, not how to do advanced natural language processing, so we’re going to use a great library called PyTLDR.

PyTLDR implements several summarization algorithms, but the one we’re going to use is TextRank. The summarization function looks like this:

The summarize_web_page  function takes either a string containing the article text, of a URL. If we give it a URL, as we are doing here, it uses Goose extractor behind the scenes to fetch the article text from the web page.

The function also takes a length parameter. If this is a value between zero and one, it represents the summary length as a fraction of the length of the original article. If it is greater than one, it represents a number of sentences. We have picked three as our summary length, which seems to strike the right balance between providing a useful summary and copying large pieces of the article.

The output of the summary function is a list of sentences. Before returning the summary from the function, we join them with newlines.

In the main loop, we call the function as follows:

Commenting on submissions

Once we have got the summary, we can generate the comment and post it on the article submission. Before we do that, though, we have to do a sanity check on the summary. Because article extraction from web pages is inherently unreliable, sometimes the summarize_web_page  function will return an empty string. This piece of code in our main loop checks for that case and moves on to the next submission if we can’t generate a sensible summary for the current one:

Posting the comment can fail in many ways, so we need to catch several exceptions. As we want the same handler for each one (just print the exception and move on to the next iteration of the loop), we can catch them all in one line:

Keeping track of seen submissions

We don’t want the bot to comment more than once on any article, so we keep track of them in a set that stores the unique identifiers of each post once the comment with the summary has been posted.

To persist the set of posts from one run of the program to the next, we will pickle it and store it in a file on disk.

At startup, we try to restore the set from disk, like so:

At shutdown, we store the set on disk:

We are using the register decorator from the atexit  module to make sure that no matter how our program quits the save_seen_posts  function is called. It will be called even if you hit Ctrl-C in the terminal.

In the main loop, we add the submission ID to the SEEN  set right after posting the comment:

With the set in place, we check whether the bot has already commented on the submission before trying to generate a summary:

We also check if the submission is a self-post, because by definition they do not link to any news article.

And that’s really all there is to writing a simple Reddit bot. You can make it more complicated if you want, but BitesizeNewsBot demonstrates the basics.

Comments are closed.