Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Authors: Xian-Ling Mao, Bo-Si Feng, Yi-Jing Hao, Liqiang Nie, Heyan Huang, Guihua Wen

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.
Researcher Affiliation Academia Department of Computer Science, Beijing Institute of Technology, China Department of Computing, National University of Singapore, Singapore Department of Computer Science and Technology, South China University of Technology, China EMAIL EMAIL, EMAIL
Pseudocode No The paper includes mathematical formulations and derivations but does not present pseudocode or an algorithm block.
Open Source Code Yes We have released our codes to facilitate other researchers to repeat our experiments and validate their own ideas 1. https://www.dropbox.com/s/2yral5h23lwzipp/src.zip?dl=0
Open Datasets Yes Six publicly available image datasets, namely CIFAR10, CIFAR100-20, CIFAR100-100, Local-Patch, MNIST and COVTYPE, and two crawled text datasets are used to compare the proposed approach against state-of-the-art methods. CIFAR102 dataset consists of 60K 32x32 colour images in 10 classes. CIFAR-100 is just like the CIFAR-10, except that it has 20 coarse and 100 fine superclasses, denoted as CIFAR100-202 and CIFAR1001002. Local-Patch3 contains roughly 300K 32x32 image patches. MNIST4 consists of a total of 70000 handwritten digit samples. COVTYPE5 is a common benchmark featuring 54 dimensions.
Dataset Splits Yes All the experimental results are averaged over 10 random training/test partitions. For each partition, we randomly select 100 points with their tags as queries, and the remaining points and tags as reference database.
Hardware Specification Yes All experiments are conducted on our workstation with Intel(R) Xeon(R) CPU X7560@2.27GHz and 32G memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers for reproducibility.
Experiment Setup Yes Figure 2 shows the effect of the partition interval W in S2JSD-LSH hash functions (Eq.(11)) at different code size on the CIFAR100-100 and MNIST. As we can see, the trend of m AP values decreases when W changes from 0.1 to 1.0, and our method can achieve the best accuracy synthetically when W = 0.2 on both datasets. Similar trends have been observed over other datasets. In the following experiments, we set parameter W = 0.2 for S2JSD-LSH.