Global and Local Scaling Reduce Hubs in Space

November 5, 2013:
NOTE: An updated software package for hubness analysis is available at our project homepage: http://ofai.at/research/impml/projects/hubology.html

July 31, 2012

This is the main evaluation script to re-run the whole evaluation of the work submitted to JMLR. Matlab is needed to run the scripts

Download mp_scripts-v2.zip (72MB)

The following datasets are included in the download

Usage

To run extract the files in ma_scripts.zip. Start Matlab and use eval_mld('*') to start the evaluation. Note that this takes about a day to complete. If the script is called with the second parameter set to true, eval_mld('*', 1) the (heavy to compute) Goodman-Kruskal Index will be included in computation.

To evaluate a single database use desired collection as a parameter: eval_mld('corel-corel1000.db');

Then the Matlab output will look like:


Collection: corel1000 (n=1000)
size: 1000, classes: 10, dim: 192, intrinsic dim: 9
  Original (l_2)              - S^{k=1}: 1.83, C^{k=1}: 70.7%
                                S^{k=5}: 1.45, C^{k=5}: 65.2%
                                S^{k=20}: 1.52, C^{k=20}: 63.9%
                                SYMM^{k=5}: 35.8%, SYMM^{k=10%}: 42.1%

  NICDM                       - S^{k=1}: 1.00, C^{k=1}: 72.9%
                                S^{k=5}: 0.39, C^{k=5}: 72.0%
                                S^{k=20}: 0.63, C^{k=20}: 72.3%
                                SYMM^{k=5}: 69.8%, SYMM^{k=10%}: 70.0%

  MP (Empiric)                - S^{k=1}: 0.83, C^{k=1}: 71.6%
                                S^{k=5}: 0.31, C^{k=5}: 70.3%
                                S^{k=20}: 0.05, C^{k=20}: 69.0%
                                SYMM^{k=5}: 64.0%, SYMM^{k=10%}: 69.2%

As in the paper, S^{k=5} refers to the hubness, C^{k=1,5} to the classification accuracies. SYMM^{k=5,10%} to the percentage of symmetric nearest neighbor relations.

Mutual Proximity

The Mutal Proximity function is called norm_mp_empiric() (in file norm/norm_mp_empiric.m) and can be used with any distance matrix.

Implemented variants of MP are:

 


, Last Update: July 31, 2012