Work in progress. If you come back later there'll probably be more. - SHH, 10/24/21.
There are many tutorials on doing audio classification; usually these invovle rendering your audio as (mel-)spectrograms and doing image classification on those. There are not as many tutorials on doing audio processing or generation but there's been a growing list. Lately I've become somewhat proficient with fastai and would like to port some audio processing examples over to it. There are a few choices for tasks and datasets -- great work on source separation, for example.
Since I've been interested in audio effects, I'll choose the task of reproducing Christian Steinmetz and Josh Reiss's Micro-TCN work for learning to profile audio compressors. That code uses PyTorch Lightning instead of fastai. We should be able to do the bare minimum integration with fastai by following Zach Mueller's prescription. The experience gained from doing this can hopefully serve when adapting other audio tasks & models to work with fastai.
We could just grab any old audio data and then we could learn some kind of inverse effect such as denoising: we could add noise to the audio files and then train the network to remove the noise. But what other audio datasets are available?
- torchaudio datasets. These are almost all about speech; only GTZAN is musical.
- Source separation datasets, i.e. mono-to-many
- ISMIR has a list of datasets
- We can always grab audio and then use Spotify's new Pedalboard to add effects
- Marco Martinez' Leslie effects dataset is a bit less than 1 GB. It has "dry" (input) and various target output directories such as "tremelo".
# Next line only executes on Colab. Colab users: Please enable GPU in Edit > Notebook settings ! [ -e /content ] && pip install -Uqq pip fastai git+https://github.com/drscotthawley/fastproaudio.git # Additional installs for this tutorial %pip install -q fastai_minima torchsummary pyzenodo3 wandb # Install micro-tcn and auraloss packages (from source, will take a little while) %pip install -q wheel --ignore-requires-python git+https://github.com/csteinmetz1/micro-tcn.git git+https://github.com/csteinmetz1/auraloss # After this cell finishes, restart the kernel and continue below
Note: you may need to restart the kernel to use updated packages. WARNING: Missing build requirements in pyproject.toml for git+https://github.com/csteinmetz1/auraloss. WARNING: The project does not specify a build backend, and pip cannot fall back to setuptools without 'wheel'. Note: you may need to restart the kernel to use updated packages.
from fastai.vision.all import * from fastai.text.all import * from fastai.callback.fp16 import * import wandb from fastai.callback.wandb import * import torch import torchaudio import torchaudio.functional as F import torchaudio.transforms as T from IPython.display import Audio import matplotlib.pyplot as plt import torchsummary from fastproaudio.core import * from glob import glob import json use_fastaudio = False if use_fastaudio: from fastaudio.core.all import * from fastaudio.augment.all import * from fastaudio.ci import skip_if_ci
The "SignalTrain LA2A Reduced" dataset is something I made Friday night. It's only the first 10 seconds of each of the 20-minute audio files making up the full SignalTrain LA2A dataset, which consists of lots of audio files run through an LA2A audio compressor at different knob settings. At 200 MB, the Reduced version is enough to train the model some and see that it's working for the purposes of this demo, though you'd probably want more data to make a high-quality model. (If you'd rather train using the full 20 GB dataset, use
URLs.SIGNALTRAIN_LA2A_1_1 below, but everything will take longer!)
path = get_audio_data(URLs.SIGNALTRAIN_LA2A_REDUCED); path
fnames_in = sorted(glob(str(path)+'/*/input*')) fnames_targ = sorted(glob(str(path)+'/*/*targ*')) ind = -1 # pick one spot in the list of files fnames_in[ind], fnames_targ[ind]
import warnings warnings.filterwarnings("ignore", category=UserWarning) # turn off annoying matplotlib warnings waveform, sample_rate = torchaudio.load(fnames_in[ind]) show_audio(waveform, sample_rate)
Shape: (1, 441000), Dtype: torch.float32, Duration: 10.0 s Max: 0.225, Min: -0.218, Mean: 0.000, Std Dev: 0.038