Localized Centering: Reducing Hubness in Large-Sample Data
Authors: Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, Miloš Radovanović
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results using synthetic data indicate that localized centering reduces the hubness not suppressed by classical centering. Using large real-world datasets, moreover, we show that the proposed method improves the performance of document classification with k NN insofar as it reduces hubness. |
| Researcher Affiliation | Academia | kazuo.hara@gmail.com National Institute of Genetics Mishima, Shizuoka, Japan; Ikumi Suzuki suzuki.ikumi@gmail.com National Institute of Genetics Mishima, Shizuoka, Japan; Masashi Shimbo shimbo@is.naist.jp Nara Institute of Science and Technology Ikoma, Nara, Japan; Kei Kobayashi kei@ism.ac.jp The Institute of Statistical Mathematics Tachikawa, Tokyo, Japan; Kenji Fukumizu fukumizu@ism.ac.jp The Institute of Statistical Mathematics Tachikawa, Tokyo, Japan; Miloˇs Radovanovi c radacha@dmi.uns.ac.rs University of Novi Sad Novi Sad, Serbia |
| Pseudocode | No | The paper describes mathematical formulations and processes but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a MATLAB script (norm mp empiric.m distributed at http://ofai.at/ dominik.schnitzer/mp) for a baseline method (Mutual Proximity), but does not provide or state that the source code for their proposed method (Localized Centering) is available. |
| Open Datasets | Yes | The datasets are: Web KB, Reuters-52, and 20Newsgroups, all preprocessed and distributed by Cardoso-Cachopo (2007), and TDT2-30 distributed by Cai, He, and Han (2005). ... Cai, D.; He, X.; and Han, J. 2005. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering 17(12):1624 1637. Datasets available at http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html. Cardoso-Cachopo, A. 2007. Improving Methods for Singlelabel Text Categorization. Phd thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa. Datasets available at http://web.ist.utl.pt/acardoso/datasets/. |
| Dataset Splits | Yes | To simulate a situation in which the number of training samples is large, we ignored the predefined training-test splits provided with the datasets. Instead, the performance was evaluated by the accuracy of the leave-one-out cross validation over all samples. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a "MATLAB script" for one of the methods (MP), but it does not specify any software versions (e.g., MATLAB version, or versions of any libraries or frameworks used). |
| Experiment Setup | Yes | We represented each document as a tf-idf weighted bag-of-word vector normalized to unit length. Throughout the experiment, inner product is used as the measure of similarity. ... The parameter κ in Local Affinity can be different from the parameter k of the k NN classification performed subsequently. Indeed, in later experiments, we will tune κ so as to maximize the correlation with the N10 skewness, independently from the k NN classification. ... Parameter γ can be tuned so as to maximally reduce the skewness of the Nk distribution. |