Tutorial
[ ]:
import sys
import os
sys.path.insert(0, os.path.abspath('../../../src'))
In your conda environment, to extract embeddings according to the ISMIR paper, simply call aisfx.inference.main()
on a directory consisting of only audio-files. No other function needs to be called. Input audio pre-processing will already be done for you, i.e., down-mix to 1 channel, normalize, re-sample to 44100 Hz.
[5]:
from aisfx import inference
path_dirAudio = input('Directory to audio-files for processing: ')
path_dirExport = input('Directory to store extracted embeddings: ')
inference.main(path_dirAudio, path_dirExport,
emb_hop_size=0.5,
use_cuda=True)
Feel free to modify the hop size for computing embeddings.
[7]:
from aisfx import inference
""" The input to the model is approximately 2s of audio. So,
emb_hop_size = 0.5 (hop by 1s)
emb_hop_size = 0.05 (hop by 0.1s)
"""
inference.main(path_dirAudio, path_dirExport,
emb_hop_size=0.05,
use_cuda=True)
Other parameters to modify include:
Computing spectrograms with Essentia vs Librosa (Librosa may yield lower results, models were initially trained with Essentia): spectrogram_type
Changing the normalization of spectrograms before input to the model for embedding extraction: spec_norm, norm_mn, norm_mx
Dropping partial blocks when computing embeddings: drop_partial_block
Using the model with cpu or CUDA support: use_cuda
[ ]:
from aisfx import inference
inference.main(path_dirAudio, path_dirExport,
spectrogram_type='essentia',
spec_norm='original', norm_mn=None, norm_mx=None,
drop_partial_block=True,
emb_hop_size=0.5,
use_cuda=True)