Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Authors: Xian-Ling Mao, Bo-Si Feng, Yi-Jing Hao, Liqiang Nie, Heyan Huang, Guihua Wen

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.
Researcher Affiliation	Academia	Department of Computer Science, Beijing Institute of Technology, China Department of Computing, National University of Singapore, Singapore Department of Computer Science and Technology, South China University of Technology, China EMAIL EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical formulations and derivations but does not present pseudocode or an algorithm block.
Open Source Code	Yes	We have released our codes to facilitate other researchers to repeat our experiments and validate their own ideas 1. https://www.dropbox.com/s/2yral5h23lwzipp/src.zip?dl=0
Open Datasets	Yes	Six publicly available image datasets, namely CIFAR10, CIFAR100-20, CIFAR100-100, Local-Patch, MNIST and COVTYPE, and two crawled text datasets are used to compare the proposed approach against state-of-the-art methods. CIFAR102 dataset consists of 60K 32x32 colour images in 10 classes. CIFAR-100 is just like the CIFAR-10, except that it has 20 coarse and 100 ﬁne superclasses, denoted as CIFAR100-202 and CIFAR1001002. Local-Patch3 contains roughly 300K 32x32 image patches. MNIST4 consists of a total of 70000 handwritten digit samples. COVTYPE5 is a common benchmark featuring 54 dimensions.
Dataset Splits	Yes	All the experimental results are averaged over 10 random training/test partitions. For each partition, we randomly select 100 points with their tags as queries, and the remaining points and tags as reference database.
Hardware Specification	Yes	All experiments are conducted on our workstation with Intel(R) Xeon(R) CPU X7560@2.27GHz and 32G memory.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for reproducibility.
Experiment Setup	Yes	Figure 2 shows the effect of the partition interval W in S2JSD-LSH hash functions (Eq.(11)) at different code size on the CIFAR100-100 and MNIST. As we can see, the trend of m AP values decreases when W changes from 0.1 to 1.0, and our method can achieve the best accuracy synthetically when W = 0.2 on both datasets. Similar trends have been observed over other datasets. In the following experiments, we set parameter W = 0.2 for S2JSD-LSH.