chunkadelic

Console script and callable function for preprocessing dataset of disparate-sized audio files into uniform chunks

Note: Duplicates the directory structure(s) referenced by input paths.

$ chunkadelic -h 
usage: chunkadelic [-h] [--chunk_size CHUNK_SIZE] [--sr SR] [--norm [{False,global,channel}]] [--spacing SPACING] [--strip] [--thresh THRESH]
                   [--bits BITS] [--workers WORKERS] [--nomix] [--nopad] [--verbose] [--debug]
                   output_path input_paths [input_paths ...]

positional arguments:
  output_path           Path of output for chunkified data
  input_paths           Path(s) of a file or a folder of files. (recursive)

options:
  -h, --help            show this help message and exit
  --chunk_size CHUNK_SIZE
                        Length of chunks (default: 131072)
  --sr SR               Output sample rate (default: 48000)
  --norm [{False,global,channel}]
                        Normalize audio, based on the max of the absolute value [global/channel/False] (default: False)
  --spacing SPACING     Spacing factor, advance this fraction of a chunk per copy (default: 0.5)
  --strip               Strips silence: chunks with max dB below <thresh> are not outputted (default: False)
  --thresh THRESH       threshold in dB for determining what constitutes silence (default: -70)
  --bits BITS           Bit depth: "None" uses torchaudio default | "match"=match input audio files | or specify an int (default: None)
  --workers WORKERS     Maximum number of workers to use (default: all) (default: 20)
  --nomix               (BDCT Dataset specific) exclude output of "*/Audio Files/*Mix*" (default: False)
  --nopad               Disable zero padding for audio shorter than chunk_size (default: False)
  --verbose             Extra output logging (default: False)
  --debug               Extra EXTRA output logging (default: False)

source

blow_chunks

 blow_chunks (audio:<built-inmethodtensoroftypeobjectat0x7f815b0335c0>,
              new_filename:str, chunk_size:int, sr=48000, norm='False',
              spacing=0.5, strip=False, thresh=-70, bits_per_sample=None,
              nopad=False, debug=False)

chunks up the audio and saves them with –{i} on the end of each chunk filename

	Type	Default	Details
audio	tensor		long audio file to be chunked
new_filename	str		stem of new filename(s) to be output as chunks
chunk_size	int		how big each audio chunk is, in samples
sr	int	48000	audio sample rate in Hz
norm	str	False	normalize input audio, based on the max of the absolute value [‘global’,‘channel’, or anything else for None, e.g. False]
spacing	float	0.5	fraction of each chunk to advance between hops
strip	bool	False	strip silence: chunks with max power in dB below this value will not be saved to files
thresh	int	-70	threshold in dB for determining what counts as silence
bits_per_sample	NoneType	None	kwarg for torchaudio.save, None means use defaults
nopad	bool	False	disable zero-padding, allowing samples to be shorter than chunk_size (including “leftovers” on the “ends”)
debug	bool	False	print debugging information

source

set_bit_rate

 set_bit_rate (bits, filename, debug=False)

source

chunk_one_file

 chunk_one_file (filenames:list, args, file_ind)

this chunks up one file by setting things up and then calling blow_chunks

	Type	Details
filenames	list	list of filenames from which we’ll pick one
args		output of argparse
file_ind		index from filenames list to read from

Testing sequential execution of for one file at a time:

class AttrDict(dict): # cf. https://stackoverflow.com/a/14620633/4259243
    "setup an object to hold args"
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
        
args = AttrDict()  # setup something akin to what argparse gives
args.update( {'output_path':'test_chunks', 'input_paths':['examples/'], 'sr':48000, 'chunk_size':131072, 'spacing':0.5,
    'norm':'global', 'strip':False, 'thresh':-70, 'nomix':False, 'verbose':True, 'nopad':True,
    'workers':min(32, os.cpu_count() + 4), 'debug':True, 'bits':'match' })

filenames = get_audio_filenames(args.input_paths)
print("filenames =",filenames)
for i in range(len(filenames)):
    print(f"file {i+1}/{len(filenames)}: {filenames[i]}:")
    chunk_one_file(filenames, args, i)

filenames = ['examples/stereo_pewpew.mp3', 'examples/example.wav']
file 1/2: examples/stereo_pewpew.mp3:
 --- process_one_file: filenames[0] = examples/stereo_pewpew.mp3

   About to load filenames[0] = examples/stereo_pewpew.mp3

Resampling examples/stereo_pewpew.mp3 from 44100.0 Hz to 48000 Hz
   We loaded the audio, audio.shape = torch.Size([2, 236983]).  Setting bit rate.
     Error with bits=match: Can't get audio medatadata. Choosing default=None
     set_bit_rate: bits_per_sample = None
   Bit rate set.  Calling blow_chunks...
       blow_chunks: audio.shape = torch.Size([2, 236983])
     Saving output chunk test_chunks/stereo_pewpew--0.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
     Saving output chunk test_chunks/stereo_pewpew--1.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
     Saving output chunk test_chunks/stereo_pewpew--2.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 105911])
     Saving output chunk test_chunks/stereo_pewpew--3.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 40375])
 --- File 0: examples/stereo_pewpew.mp3 completed.

file 2/2: examples/example.wav:
 --- process_one_file: filenames[1] = examples/example.wav

   About to load filenames[1] = examples/example.wav

Resampling examples/example.wav from 44100 Hz to 48000 Hz
   We loaded the audio, audio.shape = torch.Size([1, 55728]).  Setting bit rate.
     set_bit_rate: bits_per_sample = 16
   Bit rate set.  Calling blow_chunks...
       blow_chunks: audio.shape = torch.Size([1, 55728])
     Saving output chunk test_chunks/example--0.wav, bits_per_sample=16, chunk.shape=torch.Size([1, 55728])
 --- File 1: examples/example.wav completed.

The main executable chunkadelic does the same as the previous sequential execution, albeit in parallel.

Note: Restrictions in Python’s ProcessPoolExecutor prevent directly invoking parallel execution of chunk_one_file while in interactive mode or inside a Jupyter notebook: You must use the CLI (or subprocess it).

source

main

 main ()

Testing of CLI run: (don’t run this on GitHub CI or it will hang)
norm = False

norm = global

norm = channel

``` ::: :::