Boundary Detection in Music Structure Analysis using Convolutional Neural Networks

This page hosts supplementary material for
Karen Ullrich, Jan Schlüter, Thomas Grill: Boundary Detection in Music Structure Analysis using Convolutional Neural Networks. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2014, Taipei, Taiwan.


For better reproducibility, we publish the code used for our evaluation:


Again, for better reproducibility, we provide the ground truth for our test set:, and the predictions of all entries to MIREX 2012 and 2013 on our test set: Both were copied from the MIREX results 2012 and 2013, but brought into a format supported by the evaluation script above.

Furthermore, we provide the predictions of our MIREX 2014 entry on the test set: It uses the same training methodology and architecture as in the paper, but twice as many feature maps in the two convolutional layers.

Finally, we provide the mapping of the MIREX test set files to the ids in the public SALAMI dataset that we could identify by matching their annotations: salami_ids.txt

Update: We also downloaded the MIREX 2014 predictions and converted them: These were not included in the paper, but are provided here for easy comparison.


Example spectrogram and network output (The Weight by Rachel Weber, SALAMI id 1304). For every spectrogram time frame, the network computes an output value. Concatenating all values, we obtain a curve of boundary probabilities for the entire music piece (blue). Local maxima of this curve are boundary candidates, and thresholding them selects the boundary predictions (red, dotted). Ground-truth annotations are shown as short vertical bars (green).