datasets

Routines for loading/handling data

This is patterned after and relies upon aeiou.datasets, but includes some differences such as the use of audiomentations and probably Pedalboard sometime.

Eventually changes from this file will be merged into aeiou.datasets. …But not today!

DualEffectsDataset class

source

DualEffectsDataset

 DualEffectsDataset (paths, filenames=None, sample_rate=48000,
                     sample_size=65536, random_crop=True, load_frac=1.0,
                     num_gpus=8, redraw_silence=True, silence_thresh=-60,
                     max_redraws=2, augs='Stereo(), PhaseFlipper()',
                     effects_list=[<class
                     'audiomentations.augmentations.gain.Gain'>, <class 'a
                     udiomentations.augmentations.band_pass_filter.BandPas
                     sFilter'>, <class 'audiomentations.augmentations.band
                     _stop_filter.BandStopFilter'>, <class 'audiomentation
                     s.augmentations.high_pass_filter.HighPassFilter'>,
                     <class 'audiomentations.augmentations.low_pass_filter
                     .LowPassFilter'>], verbose=False)

For each __getitem_, this ill grab two bits of audio and apply the same effect to both of them.

	Type	Default	Details
paths			list of strings of directory (/tree) names to draw audio files from
filenames	NoneType	None	allow passing in the list of filenames again (e.g. for val set) to skip searching them all
sample_rate	int	48000	audio sample rate in Hz
sample_size	int	65536	how many audio samples in each “chunk”
random_crop	bool	True	take chunks from random positions within files
load_frac	float	1.0	fraction of total dataset to load
num_gpus	int	8	used only when `cache_training_data=True`, to avoid duplicates,
redraw_silence	bool	True	a chunk containing silence will be replaced with a new one
silence_thresh	int	-60	threshold in dB below which we declare to be silence
max_redraws	int	2	when redrawing silences, don’t do it more than this many
augs	str	Stereo(), PhaseFlipper()	list of augmentation transforms after PadCrop, as a string
effects_list	list	[<class ‘audiomentations.augmentations.gain.Gain’>, <class ‘audiomentations.augmentations.band_pass_filter.BandPassFilter’>, <class ‘audiomentations.augmentations.band_stop_filter.BandStopFilter’>, <class ‘audiomentations.augmentations.high_pass_filter.HighPassFilter’>, <class ‘audiomentations.augmentations.low_pass_filter.LowPassFilter’>]	, PitchShift, TanhDistortion],
verbose	bool	False	whether to print notices of reasampling or not

Testing DualEffectsDataset

Quick checks to catch minor errors and explore

(note that CI will not execute the following cells)

data_path = '../aeiou/examples/'
dataset = DualEffectsDataset(data_path)
data = dataset.__getitem__(0)
print(data)

augs = Stereo(), PhaseFlipper()
effects_list =  ['Gain', 'BandPassFilter', 'BandStopFilter', 'HighPassFilter', 'LowPassFilter']
AudioDataset:2 files found.
{'a': tensor([[-0.0661, -0.0648, -0.0633,  ...,  0.0558,  0.0524,  0.0495],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0239, -0.0192, -0.0166]]), 'b': tensor([[ 0.0185,  0.0073, -0.0046,  ..., -0.0108, -0.0156, -0.0141],
        [-0.0403, -0.0522, -0.0583,  ..., -0.0241, -0.0216, -0.0179]]), 'a1': tensor([[-0.0661, -0.0661, -0.0660,  ...,  0.0689,  0.0668,  0.0644],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0514, -0.0458, -0.0401]]), 'b1': tensor([[ 0.0185,  0.0184,  0.0177,  ..., -0.0062, -0.0065, -0.0070],
        [-0.0403, -0.0405, -0.0411,  ..., -0.0140, -0.0170, -0.0193]]), 'a2': tensor([[-0.0661, -0.0651, -0.0645,  ...,  0.0875,  0.0861,  0.0853],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0180, -0.0190, -0.0212]]), 'b2': tensor([[ 0.0185,  0.0081, -0.0016,  ..., -0.0053, -0.0090, -0.0063],
        [-0.0403, -0.0514, -0.0555,  ..., -0.0057, -0.0005,  0.0051]]), 'e1': 'LowPassFilter', 'e2': 'BandStopFilter'}

Test how the DataLoader behaves in dict pipeline mode:

dataset = DualEffectsDataset(data_path)
train_dl = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True)
batch = next(iter(train_dl))
print("batch =\n",batch)

augs = Stereo(), PhaseFlipper()
effects_list =  ['Gain', 'BandPassFilter', 'BandStopFilter', 'HighPassFilter', 'LowPassFilter']
AudioDataset:2 files found.
batch =
 {'a': tensor([[[-3.0239e-04, -3.8517e-04, -6.0043e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-3.0239e-04, -3.8517e-04, -6.0043e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[ 2.9149e-01,  2.2990e-01,  1.7710e-01,  ..., -2.4063e-02,
          -2.3992e-02, -2.1247e-02],
         [ 1.1003e-04,  1.6797e-04,  1.3461e-04,  ..., -5.7370e-03,
          -5.6048e-03, -5.5120e-03]]]), 'b': tensor([[[-0.0003, -0.0004, -0.0006,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0003, -0.0004, -0.0006,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.1027,  0.0918,  0.0796,  ..., -0.0584, -0.0534, -0.0412],
         [ 0.0017, -0.0006, -0.0032,  ...,  0.1276,  0.1195,  0.1094]]]), 'a1': tensor([[[-3.0239e-04, -3.5309e-04, -4.4016e-04,  ..., -0.0000e+00,
          -0.0000e+00, -0.0000e+00],
         [-3.0239e-04, -3.5309e-04, -4.4016e-04,  ..., -0.0000e+00,
          -0.0000e+00, -0.0000e+00]],

        [[ 2.9149e-01,  2.4293e-01,  2.2399e-01,  ..., -4.3179e-02,
          -4.1475e-02, -3.8351e-02],
         [ 1.1003e-04,  1.5571e-04,  1.0806e-04,  ..., -5.5205e-03,
          -5.4740e-03, -5.4839e-03]]]), 'b1': tensor([[[-0.0003, -0.0004, -0.0004,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0003, -0.0004, -0.0004,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.1027,  0.0933,  0.0856,  ..., -0.0665, -0.0634, -0.0530],
         [ 0.0017, -0.0003, -0.0020,  ..., -0.0320, -0.0415, -0.0496]]]), 'a2': tensor([[[-1.3810e-04, -1.7590e-04, -2.7421e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-1.3810e-04, -1.7590e-04, -2.7421e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[ 1.1102e-16, -5.3172e-02, -8.3176e-02,  ..., -7.6607e-03,
          -5.7607e-03, -1.7675e-03],
         [ 2.7105e-20,  5.0022e-05,  6.5531e-06,  ...,  1.7718e-04,
           2.2987e-04,  2.2429e-04]]]), 'b2': tensor([[[-2.1308e-04, -2.7141e-04, -4.2310e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-2.1308e-04, -2.7141e-04, -4.2310e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[-6.9389e-17, -1.0133e-02, -1.9777e-02,  ..., -5.8415e-02,
          -6.6108e-02, -6.5935e-02],
         [-1.0842e-18, -2.1146e-03, -4.1841e-03,  ...,  4.9976e-03,
          -6.5876e-03, -1.8445e-02]]]), 'e1': ['BandStopFilter', 'BandStopFilter'], 'e2': ['Gain', 'HighPassFilter']}

batch = next(iter(train_dl))
a,b, a1,b1, a2, b2, e1, e2 = batch.values()
print("clean")
playable_spectrogram(a[0], output_type='live')

print(e1[0])
playable_spectrogram(a1[0], output_type='live')

print(e2[0])
playable_spectrogram(a2[0], output_type='live')

diff = a2[0] - a1[0]
playable_spectrogram(diff, output_type='live')