API Reference

For a more detailed reference on how the above function is called, refer to the below.

aisfx.inference

aisfx.inference.get_path_modelWeights()

Return relative path to aiSFX package with best model weights.

Path to .pickle file encompassing the best ray.tune checkpoint of model X-Sequential-CE.

Returns: Path to weights.
Return type: str

aisfx.inference.load_bestModel(weights, ds_dict, cuda)

Return best model: X-Sequential-CE.

Re-loads best CrossNet model, X-Sequential-CE with saved weights on specified torch.device().

Parameters

weights (torch.model.state_dict) – PyTorch model weights.
ds_dict (dict) – Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.
cuda (str) – ‘cpu’ or ‘cuda’.

Returns

PyTorch model.

Return type

nn.Module

aisfx.inference.load_weights(cuda)

Load PyTorch model weights.

Loads weights using the path to a saved model checkpoint.

Parameters: cuda (str) – ‘cpu’ or ‘cuda’.
Returns: PyTorch model weights. dict: Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.
Return type: torch.model.state_dict

aisfx.inference.main(dir_audio, dir_export, spectrogram_type='essentia', spec_norm='original', norm_mn=None, norm_mx=None, drop_partial_block=True, emb_hop_size=0.5, use_cuda=True)

Extract and save embeddings from all files in a directory.

Uses the X-Sequential-CE model from ISMIR paper.

Parameters

dir_audio (str) – Path to audio directory for processing.
dir_export (str) – Path to directory for saving extracted embeddings.
spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library. Original ISMIR paper uses `essentia`.
emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size. Original ISMIR paper uses 0.5 (1s hop).
use_cuda (str) – ‘cpu’ or ‘cuda’.

aisfx.inference.model_get_embedding(spec, emb_hop_size, drop_partial_block, model, cuda)

Extract embeddings using a PyTorch model.

Processes Tensor spectrogram input through PyTorch model on specified torch.device().

Parameters

spec (np.ndarray) – 2D Spectrogram data.
emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size.
drop_partial_block (bool) – Whether to drop the last block if incomplete.
model (torch.nn.Module) – PyTorch model.
cuda (str) – ‘cpu’ or ‘cuda’.

Returns

The extracted embeddings, np.ndarray([num_frames, 512]), where num_frames is the number of frames based on EMB_BLOCK_LENGTH and emb_hop_size, and 512 is the embedding dimensionality.

Return type

np.ndarray

aisfx.preprocessing

aisfx.preprocessing.blocking(data, block_size, hop_size, drop_partial_block)

Block spectrograms.

Chunk the 2D data based on block_size and hop_size.

Parameters

data (np.array) – 2D numpy array to process.
block_size (int) – Number of frames.
hop_size (int) – Number of frames.
drop_partial_block (bool) – Whether to drop the last block if incomplete.

Returns

The chunked data, np.ndarray([num_blocks, block_size, data.shape[1]]).

Return type

np.ndarray

aisfx.preprocessing.compute_hopSize(block_size, hop_multiplier, data)

Compute hop size as a portion of block_size.

Use to determine the amount of overlap you want between frames.

Parameters

block_size (int) – Block size.
hop_multiplier (float) – The amount of block size to be used as hop size.
data (np.ndarray) – Data that is being blocked.

Returns

The computed hop size.

Return type

int

aisfx.preprocessing.spectrogram_normalize(spec, spec_norm='original', mn=None, mx=None)

Normalize spectrograms.

Uses MinMax normalization.

Parameters

spec (np.ndarray) – Spectrogram data.
spec_norm (str) – ‘original’ uses same cross-dataset training values as in ISMIR paper, ‘local’ uses min/max of spec, ‘user’ allows one to set their own mn and mx value using the respective arguments.
mn (float) – Minimum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.
mx (float) – Maximum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.

Returns

The normalized spectrogram.

Return type

np.ndarray

aisfx.utils

aisfx.utils.cuda_select(use_cuda=True)

Return torch.device using cpu or CUDA.

Device chosen based on (i) user-desired device and (ii) CUDA availability.

Parameters: use_cuda (bool) – User-desired device. By default, CUDA is used if available.
Returns: ‘cpu’ or ‘cuda’.
Return type: str

aisfx.utils.get_audioPaths(directory)

Return a list of audiofiles to process.

Recursively collect all paths in directory to parse.

Directory must only contain audiofiles.

Parameters: directory (str) – Directory to folder containing audiofiles to process.
Returns: The list of audiofiles to process.
Return type: list[str]

aisfx.utils.spectrogram_selector(spectrogram_type)

Selector a spectrogram computation method.

Choices are Essentia or Librosa.

Parameters: spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library.
Returns: Function to compute the spectrograms.
Return type: fn