reproducibilityindex.ai

Localized Centering: Reducing Hubness in Large-Sample Data

Authors: Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, Miloš Radovanović

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results using synthetic data indicate that localized centering reduces the hubness not suppressed by classical centering. Using large real-world datasets, moreover, we show that the proposed method improves the performance of document classiﬁcation with k NN insofar as it reduces hubness.
Researcher Affiliation	Academia	kazuo.hara@gmail.com National Institute of Genetics Mishima, Shizuoka, Japan; Ikumi Suzuki suzuki.ikumi@gmail.com National Institute of Genetics Mishima, Shizuoka, Japan; Masashi Shimbo shimbo@is.naist.jp Nara Institute of Science and Technology Ikoma, Nara, Japan; Kei Kobayashi kei@ism.ac.jp The Institute of Statistical Mathematics Tachikawa, Tokyo, Japan; Kenji Fukumizu fukumizu@ism.ac.jp The Institute of Statistical Mathematics Tachikawa, Tokyo, Japan; Miloˇs Radovanovi c radacha@dmi.uns.ac.rs University of Novi Sad Novi Sad, Serbia
Pseudocode	No	The paper describes mathematical formulations and processes but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a MATLAB script (norm mp empiric.m distributed at http://ofai.at/ dominik.schnitzer/mp) for a baseline method (Mutual Proximity), but does not provide or state that the source code for their proposed method (Localized Centering) is available.
Open Datasets	Yes	The datasets are: Web KB, Reuters-52, and 20Newsgroups, all preprocessed and distributed by Cardoso-Cachopo (2007), and TDT2-30 distributed by Cai, He, and Han (2005). ... Cai, D.; He, X.; and Han, J. 2005. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering 17(12):1624 1637. Datasets available at http://www.cad.zju.edu.cn/home/dengcai/Data/Text Data.html. Cardoso-Cachopo, A. 2007. Improving Methods for Singlelabel Text Categorization. Phd thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa. Datasets available at http://web.ist.utl.pt/acardoso/datasets/.
Dataset Splits	Yes	To simulate a situation in which the number of training samples is large, we ignored the predeﬁned training-test splits provided with the datasets. Instead, the performance was evaluated by the accuracy of the leave-one-out cross validation over all samples.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using a "MATLAB script" for one of the methods (MP), but it does not specify any software versions (e.g., MATLAB version, or versions of any libraries or frameworks used).
Experiment Setup	Yes	We represented each document as a tf-idf weighted bag-of-word vector normalized to unit length. Throughout the experiment, inner product is used as the measure of similarity. ... The parameter κ in Local Aﬃnity can be different from the parameter k of the k NN classiﬁcation performed subsequently. Indeed, in later experiments, we will tune κ so as to maximize the correlation with the N10 skewness, independently from the k NN classiﬁcation. ... Parameter γ can be tuned so as to maximally reduce the skewness of the Nk distribution.