datasets

Routines for loading/handling data

This is patterned after and relies upon aeiou.datasets, but includes some differences such as the use of audiomentations and probably Pedalboard sometime.

Eventually changes from this file will be merged into aeiou.datasets. …But not today!

DualEffectsDataset class


source

DualEffectsDataset

 DualEffectsDataset (paths, filenames=None, sample_rate=48000,
                     sample_size=65536, random_crop=True, load_frac=1.0,
                     num_gpus=8, redraw_silence=True, silence_thresh=-60,
                     max_redraws=2, augs='Stereo(), PhaseFlipper()',
                     effects_list=[<class
                     'audiomentations.augmentations.gain.Gain'>, <class 'a
                     udiomentations.augmentations.band_pass_filter.BandPas
                     sFilter'>, <class 'audiomentations.augmentations.band
                     _stop_filter.BandStopFilter'>, <class 'audiomentation
                     s.augmentations.high_pass_filter.HighPassFilter'>,
                     <class 'audiomentations.augmentations.low_pass_filter
                     .LowPassFilter'>], verbose=False)

For each __getitem_, this ill grab two bits of audio and apply the same effect to both of them.

Type Default Details
paths list of strings of directory (/tree) names to draw audio files from
filenames NoneType None allow passing in the list of filenames again (e.g. for val set) to skip searching them all
sample_rate int 48000 audio sample rate in Hz
sample_size int 65536 how many audio samples in each “chunk”
random_crop bool True take chunks from random positions within files
load_frac float 1.0 fraction of total dataset to load
num_gpus int 8 used only when cache_training_data=True, to avoid duplicates,
redraw_silence bool True a chunk containing silence will be replaced with a new one
silence_thresh int -60 threshold in dB below which we declare to be silence
max_redraws int 2 when redrawing silences, don’t do it more than this many
augs str Stereo(), PhaseFlipper() list of augmentation transforms after PadCrop, as a string
effects_list list [<class ‘audiomentations.augmentations.gain.Gain’>, <class ‘audiomentations.augmentations.band_pass_filter.BandPassFilter’>, <class ‘audiomentations.augmentations.band_stop_filter.BandStopFilter’>, <class ‘audiomentations.augmentations.high_pass_filter.HighPassFilter’>, <class ‘audiomentations.augmentations.low_pass_filter.LowPassFilter’>] , PitchShift, TanhDistortion],
verbose bool False whether to print notices of reasampling or not

Testing DualEffectsDataset

Quick checks to catch minor errors and explore

(note that CI will not execute the following cells)

data_path = '../aeiou/examples/'
dataset = DualEffectsDataset(data_path)
data = dataset.__getitem__(0)
print(data)
augs = Stereo(), PhaseFlipper()
effects_list =  ['Gain', 'BandPassFilter', 'BandStopFilter', 'HighPassFilter', 'LowPassFilter']
AudioDataset:2 files found.
{'a': tensor([[-0.0661, -0.0648, -0.0633,  ...,  0.0558,  0.0524,  0.0495],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0239, -0.0192, -0.0166]]), 'b': tensor([[ 0.0185,  0.0073, -0.0046,  ..., -0.0108, -0.0156, -0.0141],
        [-0.0403, -0.0522, -0.0583,  ..., -0.0241, -0.0216, -0.0179]]), 'a1': tensor([[-0.0661, -0.0661, -0.0660,  ...,  0.0689,  0.0668,  0.0644],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0514, -0.0458, -0.0401]]), 'b1': tensor([[ 0.0185,  0.0184,  0.0177,  ..., -0.0062, -0.0065, -0.0070],
        [-0.0403, -0.0405, -0.0411,  ..., -0.0140, -0.0170, -0.0193]]), 'a2': tensor([[-0.0661, -0.0651, -0.0645,  ...,  0.0875,  0.0861,  0.0853],
        [-0.0034, -0.0034, -0.0034,  ..., -0.0180, -0.0190, -0.0212]]), 'b2': tensor([[ 0.0185,  0.0081, -0.0016,  ..., -0.0053, -0.0090, -0.0063],
        [-0.0403, -0.0514, -0.0555,  ..., -0.0057, -0.0005,  0.0051]]), 'e1': 'LowPassFilter', 'e2': 'BandStopFilter'}

Test how the DataLoader behaves in dict pipeline mode:

dataset = DualEffectsDataset(data_path)
train_dl = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True)
batch = next(iter(train_dl))
print("batch =\n",batch)
augs = Stereo(), PhaseFlipper()
effects_list =  ['Gain', 'BandPassFilter', 'BandStopFilter', 'HighPassFilter', 'LowPassFilter']
AudioDataset:2 files found.
batch =
 {'a': tensor([[[-3.0239e-04, -3.8517e-04, -6.0043e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-3.0239e-04, -3.8517e-04, -6.0043e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[ 2.9149e-01,  2.2990e-01,  1.7710e-01,  ..., -2.4063e-02,
          -2.3992e-02, -2.1247e-02],
         [ 1.1003e-04,  1.6797e-04,  1.3461e-04,  ..., -5.7370e-03,
          -5.6048e-03, -5.5120e-03]]]), 'b': tensor([[[-0.0003, -0.0004, -0.0006,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0003, -0.0004, -0.0006,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.1027,  0.0918,  0.0796,  ..., -0.0584, -0.0534, -0.0412],
         [ 0.0017, -0.0006, -0.0032,  ...,  0.1276,  0.1195,  0.1094]]]), 'a1': tensor([[[-3.0239e-04, -3.5309e-04, -4.4016e-04,  ..., -0.0000e+00,
          -0.0000e+00, -0.0000e+00],
         [-3.0239e-04, -3.5309e-04, -4.4016e-04,  ..., -0.0000e+00,
          -0.0000e+00, -0.0000e+00]],

        [[ 2.9149e-01,  2.4293e-01,  2.2399e-01,  ..., -4.3179e-02,
          -4.1475e-02, -3.8351e-02],
         [ 1.1003e-04,  1.5571e-04,  1.0806e-04,  ..., -5.5205e-03,
          -5.4740e-03, -5.4839e-03]]]), 'b1': tensor([[[-0.0003, -0.0004, -0.0004,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0003, -0.0004, -0.0004,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.1027,  0.0933,  0.0856,  ..., -0.0665, -0.0634, -0.0530],
         [ 0.0017, -0.0003, -0.0020,  ..., -0.0320, -0.0415, -0.0496]]]), 'a2': tensor([[[-1.3810e-04, -1.7590e-04, -2.7421e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-1.3810e-04, -1.7590e-04, -2.7421e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[ 1.1102e-16, -5.3172e-02, -8.3176e-02,  ..., -7.6607e-03,
          -5.7607e-03, -1.7675e-03],
         [ 2.7105e-20,  5.0022e-05,  6.5531e-06,  ...,  1.7718e-04,
           2.2987e-04,  2.2429e-04]]]), 'b2': tensor([[[-2.1308e-04, -2.7141e-04, -4.2310e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00],
         [-2.1308e-04, -2.7141e-04, -4.2310e-04,  ...,  0.0000e+00,
           0.0000e+00,  0.0000e+00]],

        [[-6.9389e-17, -1.0133e-02, -1.9777e-02,  ..., -5.8415e-02,
          -6.6108e-02, -6.5935e-02],
         [-1.0842e-18, -2.1146e-03, -4.1841e-03,  ...,  4.9976e-03,
          -6.5876e-03, -1.8445e-02]]]), 'e1': ['BandStopFilter', 'BandStopFilter'], 'e2': ['Gain', 'HighPassFilter']}
batch = next(iter(train_dl))
a,b, a1,b1, a2, b2, e1, e2 = batch.values()
print("clean")
playable_spectrogram(a[0], output_type='live')
print(e1[0])
playable_spectrogram(a1[0], output_type='live')
print(e2[0])
playable_spectrogram(a2[0], output_type='live')
diff = a2[0] - a1[0]
playable_spectrogram(diff, output_type='live')