Machine Learning Reference List
This has been my personal reading list, first compiled ca. February 2016 & updated very infrequently (e.g. Oct 2016, Feb 2017, Sept 2017). The field moves so quickly, much of this may have been superseded by now. If you find it useful as well, that’s great.
I’m mostly interested in audio processing, so…
Jump Right Into : Audio Processing via RNNs:

Generation/Synthesis of new sounds based on training set:
 Jake Fiala: “Deep Learning and Sound” http://fiala.uk/notes/deeplearningandsound01intro
 GRUV: https://github.com/MattVitelli/GRUV. Btw, found that LSTM worked better than GRU.
 John Glover: http://www.johnglover.net/blog/generatingsoundwithrnns.html Glover used LSTM fed by phase vocoder (really just STFT).
 Google Magenta for MIDI: https://magenta.tensorflow.org/welcometomagenta
 Google WaveNet for Audio… https://deepmind.com/blog/wavenetgenerativemodelrawaudio/
 WaveNet is slow. “Fast Wavenet”: https://github.com/tomlepaine/fastwavenet
 WaveNet in Keras: https://github.com/basveeling/wavenet
General Neural Network References:
 Books/Guides on Deep/Machine Learning: (all excellent)
 Tutorials/Videos:
 Youtube Playlist on “Deep Learning”, t from Oxford U. by Nando de Freitas https://www.youtube.com/playlist?list=PLE6Wd9FREfW8dtjAuPoTuPcqmOV53Fu
 Andrew Ng’s online course on ML at Stanford comes highly recommended: http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599
 Stanford Tutorial: http://ufldl.stanford.edu/tutorial/
 Concepts in NN/Deep Learning:
 Backpropagation (i.e. the chain rule):

 Chris Olah on backprop: http://colah.github.io/posts/201508Backprop/

 Karpathy on backprop: http://cs231n.github.io/optimization2/
 Backpropagation (i.e. the chain rule):
Recurrent Neural Networks (RNN) (which mostly feature LSTM nowadays):
 RNNs in general:
 Karpathy post: http://karpathy.github.io/2015/05/21/rnneffectiveness/, Karpathy talk: https://www.youtube.com/watch?v=yCC09vCHzF8
 Excellent annotated CharNN in Keras tutorial: http://ml4a.github.io/guides/recurrent_neural_networks/
 Andrew Trask post/tutorial: https://iamtrask.github.io/2015/11/15/anyonecancodelstm/
 Denny Britz post: http://www.wildml.com/2015/09/recurrentneuralnetworkstutorialpart1introductiontornns/
 Class notes/tutorial (long!): http://minds.jacobsuniversity.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf
 CS class notes (short): https://www.willamette.edu/~gorr/classes/cs449/rnn1.html
 Excellent post by Ross Goodwin RNNs: https://medium.com/@rossgoodwin/adventuresinnarratedreality6516ff395ba3#.q2xh8dp5t
 Great List of references; https://handong1587.github.io/deep_learning/2015/10/09/rnnandlstm.html
 Karpathy post: http://karpathy.github.io/2015/05/21/rnneffectiveness/, Karpathy talk: https://www.youtube.com/watch?v=yCC09vCHzF8
 in TensorFlow: https://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.html
 Theano tutorial: http://deeplearning.net/tutorial/rnnslu.html
 Batch Normalization for: https://arxiv.org/abs/1510.01378: “applying batch normalization to the hiddentohidden transitions of our RNNs doesn’t help the training procedure. We also show that when applied to the inputtohidden transitions, batch normalization can lead to a faster convergence of the training criterion but doesn’t seem to improve the generalization performance”
Traditional RNNs suffer from vanishing/exploding gradient. Hence LSTM & others…
Long ShortTerm Memory (LSTM):
 Tutorial: http://nbviewer.jupyter.org/github/JonathanRaiman/theano_lstm/blob/master/Tutorial.ipynb
 Chris Olah post: http://colah.github.io/posts/201508UnderstandingLSTMs
 Zach Lipton post, “Demystifying LSTM” (with Tutorial theano code): http://blog.terminal.com/demistifyinglongshorttermmemorylstmrecurrentneuralnetworks/
 Demo: Lightweight TheanoLSTM: https://github.com/JonathanRaiman/theano_lstm
 Massive 33page review article by Lipton et al: http://arxiv.org/abs/1506.00019
 As of March 2016, Keras forum posts show that “stated” RNNs are still an active dev issue. (As of last year, Keras has LSTM but was resetting the “state”, = inconvenient & slow.)….Update Sept 2016: Seems to be fixed
 LSTM tutorial in Tensorflow: https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html
 Stateful LSTM in Keras for timeseries prediction: https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py
 Muchneed Docs on stateful LSTM in Keras: http://philipperemy.github.io/kerasstatefullstm/
 Tensorflow sequence prediction: http://mourafiq.com/2016/05/15/predictingsequencesusingrnnintensorflow.html
 LSTM backpropagation tutorial :) http://arunmallya.github.io/writeups/nn/lstm/index.html#/
LSTM Alternatives/advances:
 GRU (Gated Recurrent Unit) by Cho et al, “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”, http://arxiv.org/pdf/1406.1078v3.pdf (2014)
 Chung et al. Good exp of GRU & LSTM, say GRU comparable to LSTM, http://arxiv.org/abs/1412.3555
 But GRUV/MVitelli found that LSTM outperformed GRU for audio accuracy
 GRU’s are a bit simpler than LSTM, Britz blog/tutorial: http://www.wildml.com/2015/10/recurrentneuralnetworktutorialpart4implementingagrulstmrnnwithpythonandtheano
 ClockWorkRNN by Koutnik et al, http://arxiv.org/pdf/1402.3511v1.pdf
 Highway networks…
 Echo State Networks (ESN). (2008) Comparison of MLP, RNN & ESN for sequence modeling: https://www.researchgate.net/publication/224374378_A_comparison_of_MLP_RNN_and_ESN_in_determining_harmonic_contributions_from_nonlinear_loads
 Undecimated Fully Convolutional Neural Networks (UFCNN): http://arxiv.org/pdf/1508.00317.pdf
 ConvNet for Audio – Spotify analysis & recommendation: http://benanne.github.io/2014/08/05/spotifycnns.html
LSTM for Sequence to Sequence Learning:
 Main paper: http://papers.nips.cc/paper/5346sequencetosequencelearningwithneuralnetworks.pdf
 There’s an encoder step and a decoder step
 Example: https://bigaidream.gitbooks.io/subsets_ml_cookbook/content/dl/theano/theano_keras_sequence2sequence.html#kerasforsequencetosequencelearning
 Keras Seq2Seq extension: https://github.com/farizrahman4u/seq2seq
 Multiple blog pages, re. language model: https://indico.io/blog/sequencemodelingneuralnetspart1/
 Tensor flow tutorial: https://www.tensorflow.org/versions/r0.10/tutorials/seq2seq/index.html WARNING: “It takes about 18GB of disk space and several hours to prepare the training corpus.” :(
 SE post on pitchshift mapping http://stats.stackexchange.com/questions/220307/rnnlearningsinewavesofdifferentfrequencies
 Denoising:
 http://mlsp.cs.cmu.edu/people/rsingh/docs/waspaa2015.pdf
 Denoising Autoencoder: https://www.quora.com/Isitpossibletocreateanadaptivefilterusingneuralnetworksothataftertrainingitcanfilternoisysignalandgivedesiredoutput, https://www.quora.com/Canadenoisingautoencoderremoveorfilternoiseinanoisysignallikeaudioandrecoverthecleansignal
 Dereverberation:
Extended Memory Architectures:
 Memory Networks, Weston,
 First paper: https://arxiv.org/abs/1410.3916.
 Tutorial: http://www.thespermwhale.com/jaseweston/icml2016/
 EndtoEnd version: http://arxiv.org/abs/1503.08895
 Keras version: https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py
 StackAugmented Recurrent Nets, Joulin & Mikolov “Inferring Algorithmic Patterns with StackAugmented Recurrent Nets,” https://arxiv.org/pdf/1503.01007.pdf (2015)
 Neural Turing Machines (NTM), Graves et al, https://arxiv.org/pdf/1410.5401v2.pdf
 Neural Stack Machines:
 Original paper: “Learning to Transduce with Unbounded Memory” by Grefenstette et al.: https://arxiv.org/abs/1506.02516
 Trask’s tutorial blog on Neural Stack Machines
Convolutional Neural Networks:
 Video: “What is wrong with convolutional neural nets?” Geoffrey Hinton, Fields Institute, August 2017”https://www.youtube.com/watch?v=Mqt8fs6ZbHk&feature=youtu.be
 Excellent: “History of major convnet architectures” (LeNet, AlexNet, Inception ResNet, VGG,…) https://culurciello.github.io/tech/2016/06/04/nets.html
 Excellent: “A guide to convolution arithmetic for deep learning” by Dumoulin and Visin https://arxiv.org/pdf/1603.07285.pdf
 Glossary/Summary of conv net terms/concepts:
 Vector: not a true vector in the sense of vector calculus. Just a onedimensional array. “Ndimensional vector” = 1D array with N elements.
 Tensor: not a true tensor in the sense of differential geometry. Just a multidimensional arry or “matrix”.
 Affine Transformation: General math term; here we just mean multiplying by a tensor and (maybe) adding a constant bias (vector). Generalization of “linear transformation.”
 Convolution: Pretty much what you’d normally think of “convolution” in the DSP sense. The following analogy helps me too: Evaluating a finitedifference stencil on a discretised scalar field via a banded (e.g. tridiagonal) matrix would be considered a convolution in the CNN sense, because said matrix is sparse and the same weights are used throughout.
 Channel Axis: (quoting D&V): “is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).”
 Feature Map: Generally, the output of running one particular convolution kernel over a data item (e.g over an image). However there are also input feature maps, examples of which are the “channels” referred to earlier (e.g. RGB, Left/Right).
 Flattening: turn a tensor into a vector
 Pooling: can think of it like a special type of convolution kernel (except it may not just add up the kernel’s inputs). Usually “Max Pooling”, as in: take the maximum value from the the kernel’s inputs. (On the other hand, “Average pooling” really is just a regular tophat convolution.) In contrast to regular convolution, pooling does not involve zero padding, and pooling often takes place over nonoverlapping regions of the input.
 (Zero)Padding: Pretty much like in the DSP sense: add zeros to the front or end of a data stream or image, so that you can run convolution kernel all the way up to & over the boundaries of where they data’s defined.
 Transposed Convolution: Analagous to transposing a matrix to get an output with oppositelyordered shape, e.g. to go from an output feature map of one shape, back to the original shape of the input. There seems to be some confusion, whereby some people treat the transpose as if it’s an inverse, like . ??

1x1: Actually 1x1xC, where C is, e.g. the number color channels in a RGB image (3).
 an observation: the bigger shape of the kernel, the smaller the shape of its output feature map
 Related: Deconvolutional Networks http://www.matthewzeiler.com/pubs/cvpr2010/cvpr2010.pdf
Reinforcement Learning:
 OpenAIGym: https://openai.com/blog/openaigymbeta/
 Tutorial for cartpole problem: http://kvfrans.com/simplealgoritmsforsolvingcartpole/
 For games: Giraffe
 Atari DRL: Video: https://www.youtube.com/watch?v=V1eYniJ0Rnk
 BetaGo: https://github.com/maxpumperla/betago
Data Representation:
Scattering Hierarchy (multiscale representation) by Mallat (20122014), Pablo Sprechman’s talk https://youtu.be/OS6rZXKVU1Y?t=20m44s
Related Approaches to Neural Networks: (historical)
 Hidden Markov Models (HMM). Dahl used for text classification: George E. Dahl, Ryan P. Adams, and Hugo Larochelle. “Training restricted boltzmann machines on word observations.” arXiv:1202.5695v1 (2012)
 Support Vector Machine (SVM). SVMs are globally convex, which is nice (whereas NNs are only locally convex). Very effective for classification tasks. But NNs have beat them out for complex datasets & tasks. Audio app: e.g., Audio Classificiation by Gou & Li (2003) http://www.ee.columbia.edu/~sfchang/course/sprF05/papers/guolisvmaudio00.pdf
 Restricted Boltzmann Machine (RBM). Hinton et al. mid2000s
Frameworks (too many to choose from!):
 Main Ones:
 Theano  mature codebase, nonCUDA GPU support via libgpuarray
 TensorFlow  Googlesupported, awesome viz tool TensorBoard
 Keras, runs on Theano or TensorFlow as backends. VERY popular

Torch  used by LeCun & Karpathy, scripting in Lua. Not Python.
* PyTorch Python bindings for Torch, includes ‘automatic differentiation’  ScikitLearn  General system for many methods; some Keras support. Allows ‘easy’ swapping of different ML methods & models
 Others, not seeing these used as much:
 Caffe, supposed to be easy & abstract
 Lasagne  Another Theano front end for abstraction & ease of use
 Mozi, Another one build on Theano. Looks simple to use
 DeepLearning4J: The “J” is for “Java”
 scikits.neural  not popular
 Which package to choose when starting out?
 I say Keras. Everything’s supereasy and automated compared to others.
More Tutorials (e.g., appspecific):
 Lots in http://maachinelearningmastery.com
 Andrew Trask’s Blog: Andrew writes excellent tutorials. The first LSTM guide I read was his.
 Tutorials on Theano, Keras, Lasagne, RNN: https://github.com/Vict0rSch/deep_learning
 Theano:
 Theano basics: http://nbviewer.jupyter.org/github/craffel/theanotutorial/blob/master/Theano%20Tutorial.ipynb
 Then crash course via code: https://github.com/Newmu/TheanoTutorials
 LSTM in Theano: http://nbviewer.jupyter.org/github/JonathanRaiman/theano_lstm/blob/master/Tutorial.ipynb
 Tensorflow:
 TensorFlow graph vis tutorial: https://www.tensorflow.org/versions/r0.8/how_tos/graph_viz/index.html
 TensorFlow on AWS (tutorial video): https://www.youtube.com/watch?v=1QhCsO4jmoM
 LSTM tutorial in Tensorflow: https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html
 Torch:
 First, Lua: “Learn Lua in 15 Minutes”: http://tylerneylon.com/a/learnlua/
 Deep Learning in Torch: a 60minute Blitz by Soumith Chintala at CVPR2015: https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb
 LSTM in Torch (by Zaremba) https://github.com/wojzaremba/lstm
 Tutorial Videos: https://www.youtube.com/playlist?list=PLLHTzKZzVU9ebuL6DCclzI54MrPNFGqbW
More Demos:
 Anything by @karpathy
 Lightweight TheanoLSTM: https://github.com/JonathanRaiman/theano_lstm
 TensorFlow Playgrounds: http://playground.tensorflow.org
Audio Applications:
 Huang et al, “Deep Recurrent NNs for Source Separation” http://posenhuang.github.io/papers/Joint_Optimization_of_Masks_and_Deep%20Recurrent_Neural_Networks_for_Monaural_Source_Separation_TASLP2015.pdf
 Qutoe: “in parallel, for improving the efficiency of DRNN training, utterances are chopped into sequences of at most 100 time steps”
 Ron Weiss (Google) talk: “Training neural network acoustic models on waveforms” https://www.youtube.com/watch?v=sI_8EA0_ha8
 Music comp: http://www.hexahedria.com/2015/08/03/composingmusicwithrecurrentneuralnetworks/
 Predict time sequence with LSTM & Theano…GRUV
 MULTIRESOLUTION LINEAR PREDICTION BASED FEATURES FOR AUDIO ONSET DETECTION WITH BIDIRECTIONAL LSTM NEURAL NETWORKS Erik Marchi1 , Giacomo Ferroni2 , Florian Eyben1 , Leonardo Gabrielli2 , Stefano Squartini2 , Bjorn Schuller ¨ 3,1 http://mediatum.ub.tum.de/doc/1238131/865625.pdf
 John Glover on generating instrument sounds with RNN: http://www.johnglover.net/blog/generatingsoundwithrnns.html
 Example of using spectrograms as images (for an imagebased classifier): http://stackoverflow.com/questions/37213388/kerasaccuracydoesnotchange
Datasets of (mostly musical) Audio for Machine Learning:

IRMAS: for musical Instrument recognition: http://www.mtg.upf.edu/download/datasets/irmas
 Fraunhofer IDMT datasets: (Scroll down to “Published Datasets” on http://www.idmt.fraunhofer.de/en/business_units/m2d/research.html)
 IDMTSMTBass An audio database for bass transcription and signal processing
 IDMTSMTAudioEffects An audio database for automatic effect detection in recordings of electric guitar and bass
 IDMTSMTBass Synthesis A Digital Waveguide Model of the Electric Bass Guitar including Different Playing Techniques
 IDMTSMTBASSSINGLETRACK
 Multitrack studio recordings of live performances in Swing, Blues and Funk styles
 IDMTSMTGuitar An audio database for guitar transcription and signal processing
 IDMTSMTDrums An audio database for drum transcription and source separation
 Multicodec Invdec Tampering Dataset
 Massive list of datasets (most are MIDI though): http://www.audiocontentanalysis.org/datasets/

Another massive list of datasets (with many repeats from above): http://wiki.schindler.eu.com/doku.php?id=datasets:overview

Melody annotation dataset: http://medleydb.weebly.com/description.html
 Binaural audio: :
 Antoine Deleforge: http://perception.inrialpes.fr/~Deleforge/AVASM_Dataset/
Activations & Optimizers
 ELU: Exponential Linear Unit, seems to work better than ReLU in many cases https://arxiv.org/pdf/1511.07289v1.pdf
In Physics:
 “Fast cosmological parameter estimation using neural networks”, T. Auld, M. Bridges, M.P. Hobson and S.F. Gull, MNRAS. 000, 1–6 (2004), https://arxiv.org/pdf/astroph/0608174.pdf
 “Parameterized Neural Networks for HighEnergy Physics”, Baldi, P., Cranmer, K., Faucett, T., Sadowski, P., Whiteson, D. The European Physical Journal C. 76, 235, 17, May 2016, 2016, https://arxiv.org/pdf/1601.07913.pdf
Hardware:
 Amazon Web Services (AWS):
 Stanford disk image (AMI) with everything preinstalled: https://cs231n.github.io/awstutorial/
 …or just grab some other “Community AMI” with CUDA etc installed
 Another AWS setup: https://github.com/andreasjansson/simpleawsgpusetup
 TensorFlow on AWS (tutorial video): https://www.youtube.com/watch?v=1QhCsO4jmoM
 My AWS Aetup:
amia96285c4
( AMI: old cuda but works: ami63bf8209 do not like: 11777_MML (ami37a58f5d) or DeepestLearning ) When u create your own AMI it brings your server down. :(
pip install upgrade pip
pip install U numpy
sudo pip install git+git://github.com/Theano/Theano.git upgrade nodeps
sudo pip install awscli h5py
git clone <https://github.com/fchollet/keras.git>
cd keras
sudo python setup.py install
FACTOR OF 10 SPEEDUP using the g2.xlarge GPUs vs my Macbook Pro (no GPU)!!
Checkpointing:
run ‘watch’ command to execute AWS transfer to S3 ever
watch n 550 aws s3 cp /tmp/weights.hdf5 s3://hawleymainbucket
 Theano GPU setup guide: https://github.com/andreasjansson/simpleawsgpusetup
 OpenMP: Don’t forget to enable multiple OpenMP threads! Can get you at least a factor of 2 speedup!
 In most ‘modern’ Python installations (e.g. anaconda) OpenMP is automatic
 My proposed PC build: https://pcpartpicker.com/user/drscotthawley/saved/bFZ8dC
SelfOrganizing Maps:
 “Unsupervised Classification of Audio Signals by SelfOrganizing Maps and Bayesian Labeling”: http://link.springer.com/chapter/10.1007%2F9783642289422_6
 “Visualization of Tonal Content in the Symbolic and Audio Domains“ http://www.ccarh.org/publications/cm/15/cm1510toiviainen.pdf
“Weird Stuff”:
 Stochastic path Deep NN for image rec: http://arxiv.org/pdf/1603.09382v1.pdf