On High Dimensional Data Analysis in Music Information Retrieval

Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other.

Our own previous research efforts have focused on the impact of distance concentration and hubness on nearest neighbor based music recommendation and genre classification. As a result we have developed a general unsupervised method to pre-process and rescale distance spaces which is able to decisively diminish hubness and its adverse effects in music databases but also general machine learning datasets. Research by our own and other research groups has also made it clear that concentration and hubness have an impact on many more distance based algorithms being used in high dimensional data analysis. This proposed project will explore existing and develop new approaches to deal with these problems by studying their effects on a wide range of methods in MIR, but also multimedia and machine learning. In particular we are planning to (i) study and unify rescaling methods to avoid distance concentration, (ii) explore the role of hubness in unsupervised (clustering, visualization) and supervised learning (classification) in high dimensional spaces.

The main focus of this project is on MIR since this is where the majority of results on hubness and concentration exist. But the evaluation of our results in the broader field of multimedia and machine learning will make sure that our research has the potential to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.


  • Feldbauer R., Flexer A.: A comprehensive empirical comparison of hubness reduction in high-dimensional spaces, Knowlege and Information Systems, published online 18th of May, 2018. DOI: https://doi.org/10.1007/s10115-018-1205-y
  • Feldbauer R., Flexer A.: Centering versus Scaling for Hubness Reduction, in Proceedings of the 25th International Conference on Artificial Neural Networks (ICANN'16), Part I, pp. 175-183, Springer International Publishing, 2016.
  • Feldbauer R., Leodolter M., Plant C., Flexer A.: Fast approximate hubness reduction for large high-dimensional data, Proceedings of the IEEE International Conference on Big Knowledge (ICBK), 2018.
  • Flexer A.: Hubness-aware outlier detection for music genre recognition, in Proceedings of the 19th International Conference on Digital Audio Effects (DAFx-16), pp. 69-75, 2016.
  • Flexer A.: An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection, in Proceedings of 4th International Workshop on High Dimensional Data Mining (HDM), in conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2016), Barcelona, Spain, 2016.
  • Flexer A.: Improving visualization of high-dimensional music similarity spaces, 16th International Society for Music Information Retrieval Conference, Malaga, Spain, 2015.
  • Flexer A.: The impact of hubness on music recommendation, Machine Learning for Music Discovery Workshop at the 32nd International Conference on Machine Learning, Lille, France, 2015.
  • Flexer A.: On inter-rater agreement in audio music similarity, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR'14), Taipei, Taiwan, 2014.
  • Flexer A., Grill T.: The Problem of Limited Inter-rater Agreement in Modelling Music Similarity, Journal of New Music Research, Vol. 45, No. 3, pp. 239-251, 2016. DOI: http://dx.doi.org/10.1080/09298215.2016.1200631
  • Flexer A. and Schnitzer D.: Choosing lp norms in high-dimensional spaces based on hub analysis, Neurocomputing, Volume 169, pp. 281-287, 2015. DOI: http://dx.doi.org/10.1016/j.neucom.2014.11.084
  • Flexer A., Stevens J.: Mutual proximity graphs for improved reachability in music recommendation, Journal of New Music Research, Vol. 47 , No. 1, pp. 17-28, 2018 (published online 3rd of August, 2017). DOI: http://dx.doi.org/10.1080/09298215.2017.1354891
  • Flexer A., Stevens J.: Mutual proximity graphs for music recommendation, Proceedings of the 9th International Workshop on Machine Learning and Music, Riva del Garda, Italy, 2016.
  • Schnitzer D., Flexer A.: The Unbalancing Effect of Hubs on K-medoids Clustering in High-Dimensional Spaces, Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 2015.

Research staff


Key facts