API Reference
For a more detailed reference on how the above function is called, refer to the below.
aisfx.inference
- aisfx.inference.get_path_modelWeights()
Return relative path to
aiSFX
package with best model weights.Path to .pickle file encompassing the best ray.tune checkpoint of model X-Sequential-CE.
- Returns
Path to weights.
- Return type
str
- aisfx.inference.load_bestModel(weights, ds_dict, cuda)
Return best model: X-Sequential-CE.
Re-loads best CrossNet model, X-Sequential-CE with saved weights on specified torch.device().
- Parameters
weights (torch.model.state_dict) – PyTorch model weights.
ds_dict (dict) – Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.
cuda (str) – ‘cpu’ or ‘cuda’.
- Returns
PyTorch model.
- Return type
nn.Module
- aisfx.inference.load_weights(cuda)
Load PyTorch model weights.
Loads weights using the path to a saved model checkpoint.
- Parameters
cuda (str) – ‘cpu’ or ‘cuda’.
- Returns
PyTorch model weights. dict: Dictionary of dataset names and number of classes for all datasets used in cross-dataset training.
- Return type
torch.model.state_dict
- aisfx.inference.main(dir_audio, dir_export, spectrogram_type='essentia', spec_norm='original', norm_mn=None, norm_mx=None, drop_partial_block=True, emb_hop_size=0.5, use_cuda=True)
Extract and save embeddings from all files in a directory.
Uses the X-Sequential-CE model from ISMIR paper.
- Parameters
dir_audio (str) – Path to audio directory for processing.
dir_export (str) – Path to directory for saving extracted embeddings.
spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library. Original ISMIR paper uses `essentia`.
emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size. Original ISMIR paper uses 0.5 (1s hop).
use_cuda (str) – ‘cpu’ or ‘cuda’.
- aisfx.inference.model_get_embedding(spec, emb_hop_size, drop_partial_block, model, cuda)
Extract embeddings using a PyTorch model.
Processes Tensor spectrogram input through PyTorch model on specified torch.device().
- Parameters
spec (np.ndarray) – 2D Spectrogram data.
emb_hop_size (float) – Embedding hop size, a multiplier of the embedding block size.
drop_partial_block (bool) – Whether to drop the last block if incomplete.
model (torch.nn.Module) – PyTorch model.
cuda (str) – ‘cpu’ or ‘cuda’.
- Returns
The extracted embeddings, np.ndarray([num_frames, 512]), where num_frames is the number of frames based on EMB_BLOCK_LENGTH and emb_hop_size, and 512 is the embedding dimensionality.
- Return type
np.ndarray
aisfx.preprocessing
- aisfx.preprocessing.blocking(data, block_size, hop_size, drop_partial_block)
Block spectrograms.
Chunk the 2D data based on block_size and hop_size.
- Parameters
data (np.array) – 2D numpy array to process.
block_size (int) – Number of frames.
hop_size (int) – Number of frames.
drop_partial_block (bool) – Whether to drop the last block if incomplete.
- Returns
The chunked data, np.ndarray([num_blocks, block_size, data.shape[1]]).
- Return type
np.ndarray
- aisfx.preprocessing.compute_hopSize(block_size, hop_multiplier, data)
Compute hop size as a portion of block_size.
Use to determine the amount of overlap you want between frames.
- Parameters
block_size (int) – Block size.
hop_multiplier (float) – The amount of block size to be used as hop size.
data (np.ndarray) – Data that is being blocked.
- Returns
The computed hop size.
- Return type
int
- aisfx.preprocessing.spectrogram_normalize(spec, spec_norm='original', mn=None, mx=None)
Normalize spectrograms.
Uses MinMax normalization.
- Parameters
spec (np.ndarray) – Spectrogram data.
spec_norm (str) – ‘original’ uses same cross-dataset training values as in ISMIR paper, ‘local’ uses min/max of spec, ‘user’ allows one to set their own mn and mx value using the respective arguments.
mn (float) – Minimum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.
mx (float) – Maximum spectrogram value for normalization. Will be ignored unless spec_norm=’user’.
- Returns
The normalized spectrogram.
- Return type
np.ndarray
aisfx.utils
- aisfx.utils.cuda_select(use_cuda=True)
Return torch.device using cpu or CUDA.
Device chosen based on (i) user-desired device and (ii) CUDA availability.
- Parameters
use_cuda (bool) – User-desired device. By default, CUDA is used if available.
- Returns
‘cpu’ or ‘cuda’.
- Return type
str
- aisfx.utils.get_audioPaths(directory)
Return a list of audiofiles to process.
Recursively collect all paths in directory to parse.
Directory must only contain audiofiles.
- Parameters
directory (str) – Directory to folder containing audiofiles to process.
- Returns
The list of audiofiles to process.
- Return type
list[str]
- aisfx.utils.spectrogram_selector(spectrogram_type)
Selector a spectrogram computation method.
Choices are Essentia or Librosa.
- Parameters
spectrogram_type (str) – Compute spectrograms with ‘essentia’ or ‘librosa’ library.
- Returns
Function to compute the spectrograms.
- Return type
fn