Technical Reports - Query Results

Your query term was 'number = 2013-04'
1 report found
OFAI-TR-2013-04 ( 411kB PDF file)

Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces?

Arthur Flexer, Dominik Schnitzer

'Hubness' is a recently discovered general problem of machine learning in high dimensional data spaces. Hub objects have a small distance to an exceptionally large number of data points, and anti-hubs are far from all other data points. It is related to the concentration of distances which impairs the contrast of distances in high dimensional spaces. Computation of secondary distances inspired by shared nearest neighbor (SNN) approaches has been shown to reduce hubness and concentration and there already exists some work on direct application of SNN in the context of hubness in image recognition. This study applies SNN to a larger number of high dimensional real world data sets from diverse domains and compares it to two other secondary distance approaches (local scaling and mutual proximity). SNN is shown to reduce hubness but less than other approaches and, contrary to its competitors, it is only able to improve classification accuracy for half of the data sets.

Keywords: Machine Learning, High-dimensional data, Hubness, Curse of dimensionality

Citation: Proceedings of 1st International Workshop on High Dimensional Data Mining (HDM), in conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2013), Dallas, Texas