Console script and callable function for preprocessing dataset of disparate-sized audio files into uniform chunks
Note: Duplicates the directory structure(s) referenced by input paths.
$ chunkadelic -husage: chunkadelic [-h] [--chunk_size CHUNK_SIZE] [--sr SR] [--norm [{False,global,channel}]] [--spacing SPACING] [--strip] [--thresh THRESH][--bits BITS] [--workers WORKERS] [--nomix][--nopad][--verbose][--debug]output_path input_paths [input_paths ...]positional arguments:output_path Path of output for chunkified datainput_paths Path(s)of a file or a folder of files. (recursive)options:-h,--help show this help message and exit--chunk_size CHUNK_SIZELength of chunks (default: 131072)--sr SR Output sample rate (default: 48000)--norm[{False,global,channel}]Normalize audio, based on the max of the absolute value [global/channel/False](default: False)--spacing SPACING Spacing factor, advance this fraction of a chunk per copy (default: 0.5)--strip Strips silence: chunks with max dB below <thresh> are not outputted (default: False)--thresh THRESH threshold in dB for determining what constitutes silence (default:-70)--bits BITS Bit depth: "None" uses torchaudio default |"match"=match input audio files |or specify an int (default: None)--workers WORKERS Maximum number of workers to use (default: all)(default: 20)--nomix(BDCT Dataset specific)exclude output of "*/Audio Files/*Mix*"(default: False)--nopad Disable zero padding for audio shorter than chunk_size (default: False)--verbose Extra output logging (default: False)--debug Extra EXTRA output logging (default: False)
this chunks up one file by setting things up and then calling blow_chunks
Type
Details
filenames
list
list of filenames from which we’ll pick one
args
output of argparse
file_ind
index from filenames list to read from
Testing sequential execution of for one file at a time:
class AttrDict(dict): # cf. https://stackoverflow.com/a/14620633/4259243"setup an object to hold args"def__init__(self, *args, **kwargs):super(AttrDict, self).__init__(*args, **kwargs)self.__dict__ =selfargs = AttrDict() # setup something akin to what argparse givesargs.update( {'output_path':'test_chunks', 'input_paths':['examples/'], 'sr':48000, 'chunk_size':131072, 'spacing':0.5,'norm':'global', 'strip':False, 'thresh':-70, 'nomix':False, 'verbose':True, 'nopad':True,'workers':min(32, os.cpu_count() +4), 'debug':True, 'bits':'match' })filenames = get_audio_filenames(args.input_paths)print("filenames =",filenames)for i inrange(len(filenames)):print(f"file {i+1}/{len(filenames)}: {filenames[i]}:") chunk_one_file(filenames, args, i)
filenames = ['examples/stereo_pewpew.mp3', 'examples/example.wav']
file 1/2: examples/stereo_pewpew.mp3:
--- process_one_file: filenames[0] = examples/stereo_pewpew.mp3
About to load filenames[0] = examples/stereo_pewpew.mp3
Resampling examples/stereo_pewpew.mp3 from 44100.0 Hz to 48000 Hz
We loaded the audio, audio.shape = torch.Size([2, 236983]). Setting bit rate.
Error with bits=match: Can't get audio medatadata. Choosing default=None
set_bit_rate: bits_per_sample = None
Bit rate set. Calling blow_chunks...
blow_chunks: audio.shape = torch.Size([2, 236983])
Saving output chunk test_chunks/stereo_pewpew--0.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
Saving output chunk test_chunks/stereo_pewpew--1.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
Saving output chunk test_chunks/stereo_pewpew--2.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 105911])
Saving output chunk test_chunks/stereo_pewpew--3.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 40375])
--- File 0: examples/stereo_pewpew.mp3 completed.
file 2/2: examples/example.wav:
--- process_one_file: filenames[1] = examples/example.wav
About to load filenames[1] = examples/example.wav
Resampling examples/example.wav from 44100 Hz to 48000 Hz
We loaded the audio, audio.shape = torch.Size([1, 55728]). Setting bit rate.
set_bit_rate: bits_per_sample = 16
Bit rate set. Calling blow_chunks...
blow_chunks: audio.shape = torch.Size([1, 55728])
Saving output chunk test_chunks/example--0.wav, bits_per_sample=16, chunk.shape=torch.Size([1, 55728])
--- File 1: examples/example.wav completed.
The main executable chunkadelic does the same as the previous sequential execution, albeit in parallel.
Note: Restrictions in Python’s ProcessPoolExecutor prevent directly invoking parallel execution of chunk_one_file while in interactive mode or inside a Jupyter notebook: You must use the CLI (or subprocess it).