API details.

Some audio data urls:

URLs.AUDIOMDPI = 'https://zenodo.org/record/3562442'

URLs.MARCO = URLs.AUDIOMDPI # just a shorthand alias I'm more likely to remember

URLs.SIGNALTRAIN_LA2A_1_1 = 'https://zenodo.org/record/3824876'

URLs.SIGNALTRAIN_LA2A_REDUCED = 'http://hedges.belmont.edu/data/SignalTrain_LA2A_Reduced.tgz'

zenodo_url_to_data_url[source]

zenodo_url_to_data_url(url)

print(URLs.MARCO)
print(zenodo_url_to_data_url(URLs.MARCO))
https://zenodo.org/record/3562442
https://zenodo.org/api/files/d6589bb4-d6a6-4bc6-8e51-e6334fafbe3f/AudioMDPI.zip
print(URLs.SIGNALTRAIN_LA2A_1_1)
print(zenodo_url_to_data_url(URLs.SIGNALTRAIN_LA2A_1_1))
https://zenodo.org/record/3824876
https://zenodo.org/api/files/df302f12-7355-452e-93d1-b0c9344608f7/SignalTrain_LA2A_Dataset_1.1.tgz

get_audio_data[source]

get_audio_data(url)

Try downloading a sample .tgz file

path_st = get_audio_data(URLs.SIGNALTRAIN_LA2A_REDUCED)
path_st
Path('/home/shawley/.fastai/data/SignalTrain_LA2A_Reduced')

And try downloading from a Zenodo URL:

path_audiomdpi = get_audio_data(URLs.MARCO)
path_audiomdpi
Path('/home/shawley/.fastai/data/AudioMDPI')
 

Let's use this data as an example and take a look at it:

path_audiomdpi.ls()
(#4) [Path('/home/shawley/.fastai/data/AudioMDPI/LeslieWoofer'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn'),Path('/home/shawley/.fastai/data/AudioMDPI/license.txt'),Path('/home/shawley/.fastai/data/AudioMDPI/6176ChannelStrip')]

We'll grab the LeslieHorn subset

horn = path_audiomdpi / "LeslieHorn"; horn.ls()
(#4) [Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/readme.txt'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/chorale'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/tremolo'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/dry')]
path_dry = horn /'dry'
#path_trem = horn / 'tremolo'
audio_extensions = ['.m3u', '.ram', '.au', '.snd', '.mp3','.wav']
fnames_dry = get_files(path_dry, extensions=audio_extensions)
waveform, sample_rate = torchaudio.load(fnames_dry[0])

Let's take a look at it:

show_info[source]

show_info(waveform, sample_rate)

plot_waveform[source]

plot_waveform(waveform, sample_rate, ax=None, xlim=None, ylim=[-1, 1], color='blue', label='', title='Waveform')

Waveform plot, from https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html

plot_melspec[source]

plot_melspec(waveform, sample_rate, ax=None, ref=amax, vmin=-70, vmax=0)

Mel-spectrogram plot, from librosa documentation

play_audio[source]

play_audio(waveform, sample_rate)

From torchaudio preprocessing tutorial. note ipython docs claim Audio can already do multichannel: "# Can also do stereo or more channels"

show_audio[source]

show_audio(waveform, sample_rate, info=True, play=True, plots=['waveform', 'melspec'], ref=500, mc_plot=False)

This display routine is an amalgam of the torchaudio tutorial and the librosa documentation:

show_audio(waveform, sample_rate)
Shape: (1, 110250), Dtype: torch.float32, Duration: 2.5 s
Max:  1.000,  Min: -0.973, Mean: -0.000, Std Dev:  0.086
show_audio(waveform, sample_rate, info=False, play=False, plots=['melspec'], ref=1)

Multichannel Concerns:

Let's make a multi-channel tensor and "show" it:

num_channels = 5
n = waveform.shape[-1]*3
waveform2 = torch.zeros((num_channels,n))
for c in range(num_channels):
    start = int(np.random.rand()*waveform.shape[-1]*(2))
    this_waveform, _ = torchaudio.load(fnames_dry[c])
    waveform2[c, start:start+waveform.shape[-1]] = this_waveform
show_audio(waveform2, sample_rate)
Shape: (5, 330750), Dtype: torch.float32, Duration: 7.5 s
Max:  1.000,  Min: -1.000, Mean: -0.000, Std Dev:  0.037