Machine Learning Reference List

This has been my personal reading list, first compiled ca. February 2016 & updated very infrequently (e.g. Oct 2016, Feb 2017, Sept 2017). The field moves so quickly, much of this may have been superseded by now. If you find it useful as well, that’s great.

I’m mostly interested in audio processing, so…

Jump Right Into : Audio Processing via RNNs:

General Neural Network References:

Recurrent Neural Networks (RNN) (which mostly feature LSTM nowadays):

Traditional RNNs suffer from vanishing/exploding gradient. Hence LSTM & others…

Long Short-Term Memory (LSTM):

LSTM Alternatives/advances:

LSTM for Sequence to Sequence Learning:

Extended Memory Architectures:

Convolutional Neural Networks:

  • Video: “What is wrong with convolutional neural nets?” Geoffrey Hinton, Fields Institute, August 2017”
  • Excellent: “History of major convnet architectures” (LeNet, AlexNet, Inception ResNet, VGG,…)
  • Excellent: “A guide to convolution arithmetic for deep learning” by Dumoulin and Visin
  • Glossary/Summary of conv net terms/concepts:
    • Vector: not a true vector in the sense of vector calculus. Just a one-dimensional array. “N-dimensional vector” = 1-D array with N elements.
    • Tensor: not a true tensor in the sense of differential geometry. Just a multi-dimensional arry or “matrix”.
    • Affine Transformation: General math term; here we just mean multiplying by a tensor and (maybe) adding a constant bias (vector). Generalization of “linear transformation.”
    • Convolution: Pretty much what you’d normally think of “convolution” in the DSP sense. The following analogy helps me too: Evaluating a finite-difference stencil on a discretised scalar field via a banded (e.g. tridiagonal) matrix would be considered a convolution in the CNN sense, because said matrix is sparse and the same weights are used throughout.
    • Channel Axis: (quoting D&V): “is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).”
    • Feature Map: Generally, the output of running one particular convolution kernel over a data item (e.g over an image). However there are also input feature maps, examples of which are the “channels” referred to earlier (e.g. RGB, Left/Right).
    • Flattening: turn a tensor into a vector
    • Pooling: can think of it like a special type of convolution kernel (except it may not just add up the kernel’s inputs). Usually “Max Pooling”, as in: take the maximum value from the the kernel’s inputs. (On the other hand, “Average pooling” really is just a regular top-hat convolution.) In contrast to regular convolution, pooling does not involve zero padding, and pooling often takes place over non-overlapping regions of the input.
    • (Zero-)Padding: Pretty much like in the DSP sense: add zeros to the front or end of a data stream or image, so that you can run convolution kernel all the way up to & over the boundaries of where they data’s defined.
    • Transposed Convolution: Analagous to transposing a matrix to get an output with oppositely-ordered shape, e.g. to go from an output feature map of one shape, back to the original shape of the input. There seems to be some confusion, whereby some people treat the transpose as if it’s an inverse, like . ??
    • 1x1: Actually 1x1xC, where C is, e.g. the number color channels in a RGB image (3).

    • an observation: the bigger shape of the kernel, the smaller the shape of its output feature map
  • Related: Deconvolutional Networks

Reinforcement Learning:

Data Representation:

Scattering Hierarchy (multi-scale representation) by Mallat (2012-2014), Pablo Sprechman’s talk

  • Hidden Markov Models (HMM). Dahl used for text classification: George E. Dahl, Ryan P. Adams, and Hugo Larochelle. “Training restricted boltzmann machines on word observations.” arXiv:1202.5695v1 (2012)
  • Support Vector Machine (SVM). SVMs are globally convex, which is nice (whereas NNs are only locally convex). Very effective for classification tasks. But NNs have beat them out for complex datasets & tasks. Audio app: e.g., Audio Classificiation by Gou & Li (2003)
  • Restricted Boltzmann Machine (RBM). Hinton et al. mid-2000s

Frameworks (too many to choose from!):

  • Main Ones:
    • Theano - mature codebase, non-CUDA GPU support via libgpuarray
    • TensorFlow - Google-supported, awesome viz tool TensorBoard
    • Keras, runs on Theano or TensorFlow as backends. VERY popular
    • Torch - used by LeCun & Karpathy, scripting in Lua. Not Python.
      * PyTorch Python bindings for Torch, includes ‘automatic differentiation’

    • Scikit-Learn - General system for many methods; some Keras support. Allows ‘easy’ swapping of different ML methods & models
  • Others, not seeing these used as much:
    • Caffe, supposed to be easy & abstract
    • Lasagne - Another Theano front end for abstraction & ease of use
    • Mozi, Another one build on Theano. Looks simple to use
    • DeepLearning4J: The “J” is for “Java”
    • scikits.neural - not popular
  • Which package to choose when starting out?
    • I say Keras. Everything’s super-easy and automated compared to others.

More Tutorials (e.g., app-specific):

More Demos:

Audio Applications:

Datasets of (mostly musical) Audio for Machine Learning:

Activations & Optimizers

In Physics:

  • “Fast cosmological parameter estimation using neural networks”, T. Auld, M. Bridges, M.P. Hobson and S.F. Gull, MNRAS. 000, 1–6 (2004),
  • “Parameterized Neural Networks for High-Energy Physics”, Baldi, P., Cranmer, K., Faucett, T., Sadowski, P., Whiteson, D. The European Physical Journal C. 76, 235, 1-7, May 2016, 2016,


pip install --upgrade pip
pip install -U numpy
sudo pip install git+git:// --upgrade --no-deps
sudo pip install awscli h5py
git clone <>
cd keras
sudo python install

FACTOR OF 10 SPEEDUP using the g2.xlarge GPUs vs my Macbook Pro (no GPU)!!


run ‘watch’ command to execute AWS transfer to S3 ever seconds ...and spot instance went down without any checkpoint to allow uploading from EC2 to S3 it's convoluted: install aws cli create an "IAM" user. Grant the user permissions to upload to s3 via <> aws configure ...good to go.

watch -n 550 aws s3 cp /tmp/weights.hdf5 s3://hawleymainbucket

Self-Organizing Maps:

“Weird Stuff”: