Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Product Quantized Translation for Fast Nearest Neighbor Search

Authors: Yoonho Hwang, Mooyeol Baek, Saehoon Kim, Bohyung Han, Hee-Kap Ahn

AAAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Although our framework is composed of simple operations only, it achieves the state-of-the-art performance compared to existing nearest neighbor search techniques, which is illustrated quantitatively using various large-scale benchmark datasets in different sizes and dimensions.
Researcher Affiliation	Collaboration	Yoonho Hwang, Mooyeol Baek, Saehoon Kim, Bohyung Han, Hee-Kap Ahn Dept. of Computer Science and Engineering POSTECH, Korea EMAIL S.Kim is also afﬁliated with AItrics.
Pseudocode	Yes	The pseudocode of our algorithm is presented in Algorithm 1.
Open Source Code	No	The paper states 'We use the source codes released by authors for the implementations of the external algorithms', but it does not provide any explicit statement or link indicating that the source code for their own proposed methodology is publicly available.
Open Datasets	Yes	We perform the experiments on four independent datasets, which are denoted by MNIST (Lecun et al. 1998), SIFT5M (J egou, Douze, and Schmid 2011), GIST1M (J egou, Douze, and Schmid 2011), and MSCOCO (Lin et al. 2014).
Dataset Splits	No	The paper states that MS-COCO 'contains 4,096-dimensional vectors, which are feature descriptors for 123,287 images in training and validation sets', but it does not specify explicit training/validation/test split percentages, sample counts, or the methodology for data partitioning needed to reproduce the experiment across all datasets or how the validation set was utilized in this specific experimental setup.
Hardware Specification	Yes	All tested algorithms are implemented in C++ in Linux (Fedora 21, g++ 4.9.2), speciﬁcally using a single core on Intel Core i7-5820k@3.30Ghz with 64GB main memory.
Software Dependencies	Yes	All tested algorithms are implemented in C++ in Linux (Fedora 21, g++ 4.9.2), speciﬁcally using a single core on Intel Core i7-5820k@3.30Ghz with 64GB main memory.
Experiment Setup	Yes	By default, the number of clusters and the dimensionality of partitions are set to 64 and 32, respectively.