API Reference

For a more detailed reference on how the above function is called, refer to the below.

aisfx.inference

aisfx.inference.get_path_modelWeights()

Return relative path to aiSFX package with best model weights.

Path to .pickle file encompassing the best ray.tune checkpoint of model X-Sequential-CE.

Returns

Path to weights.

Return type

str

aisfx.inference.load_bestModel(weights, ds_dict, cuda)

Return best model: X-Sequential-CE.

Re-loads best CrossNet model, X-Sequential-CE with saved weights on specified torch.device().

Parameters
  • weights (torch.model.state_dict) – PyTorch model weights.

  • ds_dict (dict) – Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.

  • cuda (str) – ‘cpu’ or ‘cuda’.

Returns

PyTorch model.

Return type

nn.Module

aisfx.inference.load_weights(cuda)

Load PyTorch model weights.

Loads weights using the path to a saved model checkpoint.

Parameters

cuda (str) – ‘cpu’ or ‘cuda’.

Returns

PyTorch model weights. dict: Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.

Return type

torch.model.state_dict

aisfx.inference.main(dir_audio, dir_export, spectrogram_type='essentia', spec_norm='original', norm_mn=None, norm_mx=None, drop_partial_block=True, emb_hop_size=0.5, use_cuda=True)

Extract and save embeddings from all files in a directory.

Uses the X-Sequential-CE model from ISMIR paper.

Parameters
  • dir_audio (str) – Path to audio directory for processing.

  • dir_export (str) – Path to directory for saving extracted embeddings.

  • spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library. Original ISMIR paper uses `essentia`.

  • emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size. Original ISMIR paper uses 0.5 (1s hop).

  • use_cuda (str) – ‘cpu’ or ‘cuda’.

aisfx.inference.model_get_embedding(spec, emb_hop_size, drop_partial_block, model, cuda)

Extract embeddings using a PyTorch model.

Processes Tensor spectrogram input through PyTorch model on specified torch.device().

Parameters
  • spec (np.ndarray) – 2D Spectrogram data.

  • emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size.

  • drop_partial_block (bool) – Whether to drop the last block if incomplete.

  • model (torch.nn.Module) – PyTorch model.

  • cuda (str) – ‘cpu’ or ‘cuda’.

Returns

The extracted embeddings, np.ndarray([num_frames, 512]), where num_frames is the number of frames based on EMB_BLOCK_LENGTH and emb_hop_size, and 512 is the embedding dimensionality.

Return type

np.ndarray

aisfx.preprocessing

aisfx.preprocessing.blocking(data, block_size, hop_size, drop_partial_block)

Block spectrograms.

Chunk the 2D data based on block_size and hop_size.

Parameters
  • data (np.array) – 2D numpy array to process.

  • block_size (int) – Number of frames.

  • hop_size (int) – Number of frames.

  • drop_partial_block (bool) – Whether to drop the last block if incomplete.

Returns

The chunked data, np.ndarray([num_blocks, block_size, data.shape[1]]).

Return type

np.ndarray

aisfx.preprocessing.compute_hopSize(block_size, hop_multiplier, data)

Compute hop size as a portion of block_size.

Use to determine the amount of overlap you want between frames.

Parameters
  • block_size (int) – Block size.

  • hop_multiplier (float) – The amount of block size to be used as hop size.

  • data (np.ndarray) – Data that is being blocked.

Returns

The computed hop size.

Return type

int

aisfx.preprocessing.spectrogram_normalize(spec, spec_norm='original', mn=None, mx=None)

Normalize spectrograms.

Uses MinMax normalization.

Parameters
  • spec (np.ndarray) – Spectrogram data.

  • spec_norm (str) – ‘original’ uses same cross-dataset training values as in ISMIR paper, ‘local’ uses min/max of spec, ‘user’ allows one to set their own mn and mx value using the respective arguments.

  • mn (float) – Minimum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.

  • mx (float) – Maximum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.

Returns

The normalized spectrogram.

Return type

np.ndarray

aisfx.utils

aisfx.utils.cuda_select(use_cuda=True)

Return torch.device using cpu or CUDA.

Device chosen based on (i) user-desired device and (ii) CUDA availability.

Parameters

use_cuda (bool) – User-desired device. By default, CUDA is used if available.

Returns

‘cpu’ or ‘cuda’.

Return type

str

aisfx.utils.get_audioPaths(directory)

Return a list of audiofiles to process.

Recursively collect all paths in directory to parse.

Directory must only contain audiofiles.

Parameters

directory (str) – Directory to folder containing audiofiles to process.

Returns

The list of audiofiles to process.

Return type

list[str]

aisfx.utils.spectrogram_selector(spectrogram_type)

Selector a spectrogram computation method.

Choices are Essentia or Librosa.

Parameters

spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library.

Returns

Function to compute the spectrograms.

Return type

fn