chunkadelic

Console script and callable function for preprocessing dataset of disparate-sized audio files into uniform chunks

Note: Duplicates the directory structure(s) referenced by input paths.

$ chunkadelic -h 
usage: chunkadelic [-h] [--chunk_size CHUNK_SIZE] [--sr SR] [--norm [{False,global,channel}]] [--spacing SPACING] [--strip] [--thresh THRESH]
                   [--bits BITS] [--workers WORKERS] [--nomix] [--nopad] [--verbose] [--debug]
                   output_path input_paths [input_paths ...]

positional arguments:
  output_path           Path of output for chunkified data
  input_paths           Path(s) of a file or a folder of files. (recursive)

options:
  -h, --help            show this help message and exit
  --chunk_size CHUNK_SIZE
                        Length of chunks (default: 131072)
  --sr SR               Output sample rate (default: 48000)
  --norm [{False,global,channel}]
                        Normalize audio, based on the max of the absolute value [global/channel/False] (default: False)
  --spacing SPACING     Spacing factor, advance this fraction of a chunk per copy (default: 0.5)
  --strip               Strips silence: chunks with max dB below <thresh> are not outputted (default: False)
  --thresh THRESH       threshold in dB for determining what constitutes silence (default: -70)
  --bits BITS           Bit depth: "None" uses torchaudio default | "match"=match input audio files | or specify an int (default: None)
  --workers WORKERS     Maximum number of workers to use (default: all) (default: 20)
  --nomix               (BDCT Dataset specific) exclude output of "*/Audio Files/*Mix*" (default: False)
  --nopad               Disable zero padding for audio shorter than chunk_size (default: False)
  --verbose             Extra output logging (default: False)
  --debug               Extra EXTRA output logging (default: False)

source

blow_chunks

 blow_chunks (audio:<built-inmethodtensoroftypeobjectat0x7f815b0335c0>,
              new_filename:str, chunk_size:int, sr=48000, norm='False',
              spacing=0.5, strip=False, thresh=-70, bits_per_sample=None,
              nopad=False, debug=False)

chunks up the audio and saves them with –{i} on the end of each chunk filename

Type Default Details
audio tensor long audio file to be chunked
new_filename str stem of new filename(s) to be output as chunks
chunk_size int how big each audio chunk is, in samples
sr int 48000 audio sample rate in Hz
norm str False normalize input audio, based on the max of the absolute value [‘global’,‘channel’, or anything else for None, e.g. False]
spacing float 0.5 fraction of each chunk to advance between hops
strip bool False strip silence: chunks with max power in dB below this value will not be saved to files
thresh int -70 threshold in dB for determining what counts as silence
bits_per_sample NoneType None kwarg for torchaudio.save, None means use defaults
nopad bool False disable zero-padding, allowing samples to be shorter than chunk_size (including “leftovers” on the “ends”)
debug bool False print debugging information

source

set_bit_rate

 set_bit_rate (bits, filename, debug=False)

source

chunk_one_file

 chunk_one_file (filenames:list, args, file_ind)

this chunks up one file by setting things up and then calling blow_chunks

Type Details
filenames list list of filenames from which we’ll pick one
args output of argparse
file_ind index from filenames list to read from

Testing sequential execution of for one file at a time:

class AttrDict(dict): # cf. https://stackoverflow.com/a/14620633/4259243
    "setup an object to hold args"
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self
        
args = AttrDict()  # setup something akin to what argparse gives
args.update( {'output_path':'test_chunks', 'input_paths':['examples/'], 'sr':48000, 'chunk_size':131072, 'spacing':0.5,
    'norm':'global', 'strip':False, 'thresh':-70, 'nomix':False, 'verbose':True, 'nopad':True,
    'workers':min(32, os.cpu_count() + 4), 'debug':True, 'bits':'match' })

filenames = get_audio_filenames(args.input_paths)
print("filenames =",filenames)
for i in range(len(filenames)):
    print(f"file {i+1}/{len(filenames)}: {filenames[i]}:")
    chunk_one_file(filenames, args, i)
filenames = ['examples/stereo_pewpew.mp3', 'examples/example.wav']
file 1/2: examples/stereo_pewpew.mp3:
 --- process_one_file: filenames[0] = examples/stereo_pewpew.mp3

   About to load filenames[0] = examples/stereo_pewpew.mp3

Resampling examples/stereo_pewpew.mp3 from 44100.0 Hz to 48000 Hz
   We loaded the audio, audio.shape = torch.Size([2, 236983]).  Setting bit rate.
     Error with bits=match: Can't get audio medatadata. Choosing default=None
     set_bit_rate: bits_per_sample = None
   Bit rate set.  Calling blow_chunks...
       blow_chunks: audio.shape = torch.Size([2, 236983])
     Saving output chunk test_chunks/stereo_pewpew--0.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
     Saving output chunk test_chunks/stereo_pewpew--1.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 131072])
     Saving output chunk test_chunks/stereo_pewpew--2.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 105911])
     Saving output chunk test_chunks/stereo_pewpew--3.mp3, bits_per_sample=None, chunk.shape=torch.Size([2, 40375])
 --- File 0: examples/stereo_pewpew.mp3 completed.

file 2/2: examples/example.wav:
 --- process_one_file: filenames[1] = examples/example.wav

   About to load filenames[1] = examples/example.wav

Resampling examples/example.wav from 44100 Hz to 48000 Hz
   We loaded the audio, audio.shape = torch.Size([1, 55728]).  Setting bit rate.
     set_bit_rate: bits_per_sample = 16
   Bit rate set.  Calling blow_chunks...
       blow_chunks: audio.shape = torch.Size([1, 55728])
     Saving output chunk test_chunks/example--0.wav, bits_per_sample=16, chunk.shape=torch.Size([1, 55728])
 --- File 1: examples/example.wav completed.

The main executable chunkadelic does the same as the previous sequential execution, albeit in parallel.

Note: Restrictions in Python’s ProcessPoolExecutor prevent directly invoking parallel execution of chunk_one_file while in interactive mode or inside a Jupyter notebook: You must use the CLI (or subprocess it).


source

main

 main ()
Testing of CLI run: (don’t run this on GitHub CI or it will hang)
norm = False

norm = global


norm = channel

``` ::: :::