hpc

routines for running on clusters

This part isn’t strictly for audio i/o, but is nevertheless a normal part of Harmonai’s operations. The point of this package is to reduce code-copying between Harmonai projects.

Heads up: Huggingface accelerate support will likely be deprecated soon. We found accelerate necessary because of problems running PyTorch Lightning on multiple nodes, but those problems have now been resolved. Thus we will likely be using Lighting, so you will see that dependency being added and perhaps accelerate being removed.


source

get_accel_config

 get_accel_config
                   (filename='~/.cache/huggingface/accelerate/default_conf
                   ig.yaml')

get huggingface accelerate config info

Let’s test that:

ac = get_accel_config('examples/accel_config.yaml')
ac
{'compute_environment': 'LOCAL_MACHINE',
 'deepspeed_config': {},
 'distributed_type': 'MULTI_GPU',
 'fsdp_config': {},
 'machine_rank': 0,
 'main_process_ip': '',
 'main_process_port': 12332,
 'main_training_function': 'main',
 'mixed_precision': 'no',
 'num_machines': 2,
 'num_processes': 8,
 'use_cpu': False}

Next is a little utility to replace print, where it’ll only print on the cluster headnode. Note that you can only send one string to hprint, so use f-strings. Also we use ANSI codes to color the text (currently cyan) to help it stand out from all the other text that’s probably scrolling by!


source

HostPrinter

 HostPrinter (accelerator, tag='\x1b[96m', untag='\x1b[0m')

lil accelerate utility for only printing on host node

Type Default Details
accelerator huggingface accelerator object
tag str  starting color
untag str  reset to default color

Here’s a test:

accelerator = accelerate.Accelerator()
device = accelerator.device
hprint = HostPrinter(accelerator)  # hprint only prints on head node
hprint(f'Using device: {device}')
Using device: cpu

PyTorch+Accelerate Model routines

For when the model is wrapped in a accelerate accelerator


source

save

 save (accelerator, args, model, opt=None, epoch=None, step=None)

for checkpointing & model saves

Type Default Details
accelerator Huggingface accelerator object
args prefigure args dict, (we only use args.name)
model the model, pre-unwrapped
opt NoneType None optimizer state
epoch NoneType None training epoch number
step NoneType None training setp number

source

load

 load (accelerator, model, filename:str, opt=None)

load a saved model checkpoint

Type Default Details
accelerator Huggingface accelerator object
model an uninitialized model (pre-unwrapped) whose weights will be overwritten
filename str name of the checkpoint file
opt NoneType None optimizer state UNUSED FOR NOW

Utils for Accelerate or Lightning

Be sure to use “unwrap” any accelerate model when calling these


source

n_params

 n_params (module)

Returns the number of trainable parameters in a module. Be sure to use accelerator.unwrap_model when calling this.

Details
module raw PyTorch model/module, e.g. returned by accelerator.unwrap_model()

source

freeze

 freeze (model)

freezes model weights; turns off gradient info If using accelerate, call thisaccelerator.unwrap_model when calling this.

Details
model raw PyTorch model, e.g. returned by accelerator.unwrap_model()