On High Dimensional Data Analysis in Music Information Retrieval

A project sponsored by the Austrian National Science Foundation (FWF)
Project Number: P27082

Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other.

Our own previous research efforts have focused on the impact of distance concentration and hubness on nearest neighbor based music recommendation and genre classification. As a result we have developed a general unsupervised method to pre-process and rescale distance spaces which is able to decisively diminish hubness and its adverse effects in music databases but also general machine learning datasets. Research by our own and other research groups has also made it clear that concentration and hubness have an impact on many more distance based algorithms being used in high dimensional data analysis. This proposed project will explore existing and develop new approaches to deal with these problems by studying their effects on a wide range of methods in MIR, but also multimedia and machine learning. In particular we are planning to (i) study and unify rescaling methods to avoid distance concentration, (ii) explore the role of hubness in unsupervised (clustering, visualization) and supervised learning (classification) in high dimensional spaces.

The main focus of this project is on MIR since this is where the majority of results on hubness and concentration exist. But the evaluation of our results in the broader field of multimedia and machine learning will make sure that our research has the potential to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.


Feldbauer R., Flexer A.: A comprehensive empirical comparison of hubness reduction in high-dimensional spaces, Knowlege and Information Systems, published online 18th of May, 2018. DOI: https://doi.org/10.1007/s10115-018-1205-y

Feldbauer R., Flexer A.: Centering versus Scaling for Hubness Reduction, in Proceedings of the 25th International Conference on Artificial Neural Networks (ICANN'16), Part I, pp. 175-183, Springer International Publishing, 2016. also available as: TR-2016-05.

Feldbauer R., Leodolter M., Plant C., Flexer A.: Fast approximate hubness reduction for large high-dimensional data, Proceedings of the IEEE International Conference on Big Knowledge (ICBK), 2018. also available as: TR-2018-02.

Flexer A.: Hubness-aware outlier detection for music genre recognition, in Proceedings of the 19th International Conference on Digital Audio Effects (DAFx-16), pp. 69-75, 2016. also available as: TR-2016-09.

Flexer A.: An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection, in Proceedings of 4th International Workshop on High Dimensional Data Mining (HDM), in conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2016), Barcelona, Spain, 2016. also available as: TR-2016-10.

Flexer A.: Improving visualization of high-dimensional music similarity spaces, 16th International Society for Music Information Retrieval Conference, Malaga, Spain, 2015. also available as: TR-2015-03.

Flexer A.: The impact of hubness on music recommendation, Machine Learning for Music Discovery Workshop at the 32nd International Conference on Machine Learning, Lille, France, 2015. also available as: TR-2015-02.

Flexer A.: On inter-rater agreement in audio music similarity, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR'14), Taipei, Taiwan, 2014. also available as: TR-2014-06.

Flexer A., Grill T.: The Problem of Limited Inter-rater Agreement in Modelling Music Similarity, Journal of New Music Research, Vol. 45, No. 3, pp. 239-251, 2016. DOI: http://dx.doi.org/10.1080/09298215.2016.1200631

Flexer A. and Schnitzer D.: Choosing lp norms in high-dimensional spaces based on hub analysis, Neurocomputing, Volume 169, pp. 281-287, 2015. DOI: http://dx.doi.org/10.1016/j.neucom.2014.11.084

Flexer A., Stevens J.: Mutual proximity graphs for improved reachability in music recommendation, Journal of New Music Research, Vol. 47 , No. 1, pp. 17-28, 2018 (published online 3rd of August, 2017). DOI: http://dx.doi.org/10.1080/09298215.2017.1354891

Flexer A., Stevens J.: Mutual proximity graphs for music recommendation, Proceedings of the 9th International Workshop on Machine Learning and Music, Riva del Garda, Italy, 2016. also available as: TR-2016-06.

Schnitzer D., Flexer A.: The Unbalancing Effect of Hubs on K-medoids Clustering in High-Dimensional Spaces, Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 2015. also available as: TR-2015-01.


We provide a free open source software package for the Python programming environment which implements many hubness analysis and reduction algorithms. Please visit the GitHub page for source code, development versions, issue tracking, and contribution possibilities.

A MATLAB version of the Hub-Toolbox providing core functionality is also available on GitHub

Previous Research on Hubness

Please see the information on our previous project on “Preventing Hubness in Music Information Retrieval”.

Additional sponsoring

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.