Tutorial

[ ]:
import sys
import os
sys.path.insert(0, os.path.abspath('../../../src'))

In your conda environment, to extract embeddings according to the ISMIR paper, simply call aisfx.inference.main() on a directory consisting of only audio-files. No other function needs to be called. Input audio pre-processing will already be done for you, i.e., down-mix to 1 channel, normalize, re-sample to 44100 Hz.

[5]:
from aisfx import inference

path_dirAudio = input('Directory to audio-files for processing: ')
path_dirExport = input('Directory to store extracted embeddings: ')
inference.main(path_dirAudio, path_dirExport,
               emb_hop_size=0.5,
               use_cuda=True)

Feel free to modify the hop size for computing embeddings.

[7]:
from aisfx import inference

""" The input to the model is approximately 2s of audio. So,
emb_hop_size = 0.5  (hop by 1s)
emb_hop_size = 0.05  (hop by 0.1s)
"""

inference.main(path_dirAudio, path_dirExport,
               emb_hop_size=0.05,
               use_cuda=True)

Other parameters to modify include:

  • Computing spectrograms with Essentia vs Librosa (Librosa may yield lower results, models were initially trained with Essentia): spectrogram_type

  • Changing the normalization of spectrograms before input to the model for embedding extraction: spec_norm, norm_mn, norm_mx

  • Dropping partial blocks when computing embeddings: drop_partial_block

  • Using the model with cpu or CUDA support: use_cuda

[ ]:
from aisfx import inference

inference.main(path_dirAudio, path_dirExport,
               spectrogram_type='essentia',
               spec_norm='original', norm_mn=None, norm_mx=None,
               drop_partial_block=True,
               emb_hop_size=0.5,
               use_cuda=True)