Music segmentation can be seen as a change point detection task and therefore can be carried out with
Roughly, it consists in finding the temporal boundaries of meaningful sections, e.g. the intro, verse, chorus and outro in a song.
This is an important task in the field of music information retrieval.
The adopted approach is summarized as follows:
- the original sound is transformed into an informative (multivariate) representation;
- mean shifts are detected in this new representation using a dynamic programming approach.
In this example, we use the well-known tempogram representation, which is based on the onset strength envelope of the input signal, and captures tempo information [Grosche2010].
import librosa import librosa.display import matplotlib.pyplot as plt import numpy as np from IPython.display import Audio, display import ruptures as rpt # our package
We can also define a utility function.
def fig_ax(figsize=(15, 5), dpi=150): """Return a (matplotlib) figure and ax objects with given size.""" return plt.subplots(figsize=figsize, dpi=dpi)
duration = 30 # in seconds signal, sampling_rate = librosa.load(librosa.ex("nutcracker"), duration=duration) # listen to the music display(Audio(data=signal, rate=sampling_rate)) # look at the envelope fig, ax = fig_ax() ax.plot(np.arange(signal.size) / sampling_rate, signal) ax.set_xlim(0, signal.size / sampling_rate) ax.set_xlabel("Time (s)") _ = ax.set(title="Sound envelope")
Downloading file 'Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' from 'https://librosa.org/data/audio/Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' to '/home/runner/.cache/librosa'.