Over the last few years I’ve been playing around with NumPy, SciPy, scikit-learn and other Python libraries for data science and machine learning.
In the process, I’ve collected a bunch of nice resources that should be useful to anybody trying to get to grips with these topics in Python.
Tentative NumPy Tutorial – This is the NumPy tutorial from the SciPy wiki. It covers the basics and is written in a cookbook style, so it’s ideal for use as a reference. One to bookmark, for sure.
Python Scientific Lecture Notes – A really comprehensive set of notes that goes from basic NumPy and advanced standard Python features, to symbolic mathematics, image processing and machine learning using Scikit-learn, Scikit-image and Sympy.
Quantitative Economics with Python – This site not only contains an in-depth introduction to Python scientific computing with applications to quantitative economics, but also a touches on Pandas and IPython Notebooks, which are quickly becoming the standard for sharing computational ideas in Python.
NumPy for Matlab Users – Although my own foray into Matlab was limited to going through the Octave code for Stanford’s Machine Learning MOOC a few years ago, this tutorial has been recommended for people making the transition from Matlab to Python.
100 NumPy Exercises – Nicolas Rougier has put together a list of 100 exercises, graded from beginner to advanced levels, to teach people how to perform matrix operations the NumPy way. A great hands-on way to get to grips with the library.
Data Manipulation in Python – Mostly brief tutorials on manipulating and visualizing data from CSV files using Pandas.
Computational Statistics in Python – Ridiculously comprehensive
Beat Detection Algorithms – Short blog post about automatically detecting the tempo of a piece of music. Not Python, but still interesting.
Beat Detection Algorithms, Part II – Second part of the above post.
Gensim Tutorial – Gensim is a Python implementation of latent semantic analysis and latent direchlet allocation unsupervised topic modelling algorithms.
How to Implement a Neural Network in Python – Four-part tutorial on the basics of neural nets.
Hacker’s Guide to Neural Networks – Andrej Karpathy’s neural net tutorial.
Using pandas and scikit-learn for classification tasks – An interesting IPython Notebook published on Github by Skipper Seabold.
Machine Learning in Action – Quite old now, but a fun book that shows how to implement many common machine learning algorithms in Python.
Natural Language Processing with Python – Like the Manning book, this one is showing its age, but it remains the best introduction to NLP with NLTK available.
Speech and Language Processing – Dan Jurafsky’s book about NLP. This is amazing stuff.
Python for Data Science – An introduction to many important Python scientific computing tools, including NumPy, SciPy, Pandas and IPython Notebooks, with an eclectic set of applications.
Machine Learning for Hackers – A pragmatic introduction to machine learning topics, focused on usable implementations rather than theory.
Bayesian Methods for Hackers – It is what is sounds like: an introduction to Bayesian techniques from a code-first point of view.
Model Based Machine Learning – Early access version of Christopher Bishop’s new book.
CS109 Data Science – Harvard course with lectures. Labs and solutions made using IPython Notebooks.
Learning from Data – Caltech course on the fundamentals of machine learning. Hard unwatered down material here.
Data School – 15 hours of videos and slides by data science experts. Math heavy.
Pandas from the Group Up – PyCon 2015 presentation
Neural Nets for Newbies – PyCon 2015 presentation about neural networks by Melanie Warrick. Quite approachable.
Machine Learning with Scikit-Learn I – First of two PyCon 2015 videos about sklearn.
Machine Learning with Scikit-Learn II – Second of two PyCon 2015 videos about sklearn.
Deep Learning Course – Full video of a set of Oxford lectures on deep learning by Nando de Freitas.
Andrej Karpathy Blog – Stanford PhD student who writes a lot about machine learning, especially neural nets.
Hunch.net – John Langford’s blog about machine learning theory.
NLPers – Hal Daume III’s blog about NLP topics.
PyImageSearch – A blog about computer vision in Python.