fad_embed

Command-line script to generate embeddings from audio files

Sample calling sequences

Single processor, single GPU:

fad_embed clap real/ fake

Multiple GPUs, multiple processors (single node): (this example syntax is to run from within main fad_pytorch package directory)

accelerate launch fad_pytorch/fad_embed.py clap real/ fake/

General invocation:

$ fad_embed -h
usage: fad_embed [-h] [--batch_size BATCH_SIZE] [--sample_size SAMPLE_SIZE] [--chunk_size CHUNK_SIZE] [--hop_size HOP_SIZE] [--max_hops MAX_HOPS] [--sr SR] [--verbose]
                 [--debug]
                 embed_model real_path fake_path

positional arguments:
  embed_model           choice of embedding model(s): clap | vggish | pann | openl3 | all
  real_path             Path of files of real audio
  fake_path             Path of files of fake audio

options:
  -h, --help            show this help message and exit
  --batch_size BATCH_SIZE
                        MAXIMUM Batch size for computing embeddings (may go smaller) (default: 64)
  --sample_size SAMPLE_SIZE
                        Number of audio samples to read from each audio file (default: 262144)
  --chunk_size CHUNK_SIZE
                        Length of chunks (in audio samples) to embed (default: 24000)
  --hop_size HOP_SIZE   (approximate) time difference (in seconds) between each chunk (default: 0.1)
  --max_hops MAX_HOPS   Don't exceed this many hops/chunks/embeddings per audio file. <= 0 disables this. (default: -1)
  --sr SR               sample rate (will resample inputs at this rate) (default: 48000)
  --verbose             Show notices of resampling when reading files (default: False)
  --debug               Extra messages for debugging this program (default: False)

First a couple utilities for downloading checkpoints:

source

get_ckpt

 get_ckpt (ckpt_file='music_speech_audioset_epoch_15_esc_89.98.pt',
           ckpt_base_url='https://huggingface.co/lukewys/laion_clap/blob/m
           ain', ckpt_dl_path='/home/runner/checkpoints',
           accelerator=None)

source

download_if_needed

 download_if_needed (url, local_filename, accelerator=None)

wrapper for download file

source

download_file

 download_file (url, local_filename)

Includes a progress bar. from https://stackoverflow.com/a/37573701/4259243

source

setup_embedder

 setup_embedder (model_choice='clap', device='cuda',
                 ckpt_file='music_speech_audioset_epoch_15_esc_89.98.pt', 
                 ckpt_base_url='https://huggingface.co/lukewys/laion_clap/
                 resolve/main', accelerator=None,
                 ckpt_dl_path='/home/runner/checkpoints')

load the embedder model

	Type	Default	Details
model_choice	str	clap	‘clap’ \| ‘vggish’ \| ‘pann’
device	str	cuda
ckpt_file	str	music_speech_audioset_epoch_15_esc_89.98.pt	NOTE: ‘CLAP_CKPT’ env var overrides ckpt_file kwarg
ckpt_base_url	str	https://huggingface.co/lukewys/laion_clap/resolve/main
accelerator	NoneType	None	https://huggingface.co/lukewys/laion_clap/resolve/main/music_speech_audioset_epoch_15_esc_89.98.pt
ckpt_dl_path	str	/home/runner/checkpoints

embedder, sample_rate = setup_embedder('openl3','cuda')

/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/librosa/util/decorators.py:88: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
  return f(*args, **kwargs)
Downloading: "https://github.com/torchopenl3/torchopenl3-models/raw/master/torchopenl3_mel256_music_512.pth.tar" to /home/shawley/.cache/torch/hub/checkpoints/torchopenl3_mel256_music_512.pth.tar
100%|██████████| 34.9M/34.9M [00:00<00:00, 249MB/s]

source

embed

 embed (args)

source

main

 main ()