S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Authors: Xian-Ling Mao, Bo-Si Feng, Yi-Jing Hao, Liqiang Nie, Heyan Huang, Guihua Wen

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.
Researcher Affiliation Academia Department of Computer Science, Beijing Institute of Technology, China Department of Computing, National University of Singapore, Singapore Department of Computer Science and Technology, South China University of Technology, China {maoxl, 2120160986, 2220150504, hhy63}@bit.edu.cn nieliqiang@gmail.com, crghwen@scut.edu.cn
Pseudocode No The paper includes mathematical formulations and derivations but does not present pseudocode or an algorithm block.
Open Source Code Yes We have released our codes to facilitate other researchers to repeat our experiments and validate their own ideas 1. https://www.dropbox.com/s/2yral5h23lwzipp/src.zip?dl=0
Open Datasets Yes Six publicly available image datasets, namely CIFAR10, CIFAR100-20, CIFAR100-100, Local-Patch, MNIST and COVTYPE, and two crawled text datasets are used to compare the proposed approach against state-of-the-art methods. CIFAR102 dataset consists of 60K 32x32 colour images in 10 classes. CIFAR-100 is just like the CIFAR-10, except that it has 20 coarse and 100 fine superclasses, denoted as CIFAR100-202 and CIFAR1001002. Local-Patch3 contains roughly 300K 32x32 image patches. MNIST4 consists of a total of 70000 handwritten digit samples. COVTYPE5 is a common benchmark featuring 54 dimensions.
Dataset Splits Yes All the experimental results are averaged over 10 random training/test partitions. For each partition, we randomly select 100 points with their tags as queries, and the remaining points and tags as reference database.
Hardware Specification Yes All experiments are conducted on our workstation with Intel(R) Xeon(R) CPU X7560@2.27GHz and 32G memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers for reproducibility.
Experiment Setup Yes Figure 2 shows the effect of the partition interval W in S2JSD-LSH hash functions (Eq.(11)) at different code size on the CIFAR100-100 and MNIST. As we can see, the trend of m AP values decreases when W changes from 0.1 to 1.0, and our method can achieve the best accuracy synthetically when W = 0.2 on both datasets. Similar trends have been observed over other datasets. In the following experiments, we set parameter W = 0.2 for S2JSD-LSH.