Resources for learning data science in Python

Over the last few years I’ve been playing around with NumPy, SciPy, scikit-learn and other Python libraries for data science and machine learning.

In the process, I’ve collected a bunch of nice resources that should be useful to anybody trying to get to grips with these topics in Python.

Tutorials

Tentative NumPy Tutorial – This is the NumPy tutorial from the SciPy wiki. It covers the basics and is written in a cookbook style, so it’s ideal for use as a reference. One to bookmark, for sure.

Python Scientific Lecture Notes – A really comprehensive set of notes that goes from basic NumPy and advanced standard Python features, to symbolic mathematics, image processing and machine learning using Scikit-learn, Scikit-image and Sympy.

Quantitative Economics with Python – This site not only contains an in-depth introduction to Python scientific computing with applications to quantitative economics, but also a touches on Pandas and IPython Notebooks, which are quickly becoming the standard for sharing computational ideas in Python.

NumPy for Matlab Users – Although my own foray into Matlab was limited to going through the Octave code for Stanford’s Machine Learning MOOC a few years ago, this tutorial has been recommended for people making the transition from Matlab to Python.

100 NumPy Exercises – Nicolas Rougier has put together a list of 100 exercises, graded from beginner to advanced levels, to teach people how to perform matrix operations the NumPy way. A great hands-on way to get to grips with the library.

Data Manipulation in Python – Mostly brief tutorials on manipulating and visualizing data from CSV files using Pandas.

Computational Statistics in Python – Ridiculously comprehensive

Beat Detection Algorithms – Short blog post about automatically detecting the tempo of a piece of music. Not Python, but still interesting.

Beat Detection Algorithms, Part II – Second part of the above post.

Gensim Tutorial – Gensim is a Python implementation of latent semantic analysis and latent direchlet allocation unsupervised topic modelling algorithms.

How to Implement a Neural Network in Python – Four-part tutorial on the basics of neural nets.

Hacker’s Guide to Neural Networks – Andrej Karpathy’s neural net tutorial.

Using pandas and scikit-learn for classification tasks – An interesting IPython Notebook published on Github by Skipper Seabold.

Books

Machine Learning in Action – Quite old now, but a fun book that shows how to implement many common machine learning algorithms in Python.

Natural Language Processing with Python – Like the Manning book, this one is showing its age, but it remains the best introduction to NLP with NLTK available.

Speech and Language Processing – Dan Jurafsky’s book about NLP. This is amazing stuff.

Python for Data Science – An introduction to many important Python scientific computing tools, including NumPy, SciPy, Pandas and IPython Notebooks, with an eclectic set of applications.

Machine Learning for Hackers – A pragmatic introduction to machine learning topics, focused on usable implementations rather than theory.

Bayesian Methods for Hackers – It is what is sounds like: an introduction to Bayesian techniques from a code-first point of view.

Model Based Machine Learning – Early access version of Christopher Bishop’s new book.

Courses

CS109 Data Science – Harvard course with lectures. Labs and solutions made using IPython Notebooks.

Learning from Data – Caltech course on the fundamentals of machine learning. Hard unwatered down material here.

Videos

Data School – 15 hours of videos and slides by data science experts. Math heavy.

Pandas from the Group Up – PyCon 2015 presentation

Neural Nets for Newbies – PyCon 2015 presentation about neural networks by Melanie Warrick. Quite approachable.

Machine Learning with Scikit-Learn I – First of two PyCon 2015 videos about sklearn.

Machine Learning with Scikit-Learn II – Second of two PyCon 2015 videos about sklearn.

Deep Learning Course – Full video of a set of Oxford lectures on deep learning by Nando de Freitas.

Blogs

Andrej Karpathy Blog – Stanford PhD student who writes a lot about machine learning, especially neural nets.

Hunch.net – John Langford’s blog about machine learning theory.

NLPers – Hal Daume III’s blog about NLP topics.

PyImageSearch – A blog about computer vision in Python.