S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions
Authors: Xian-Ling Mao, Bo-Si Feng, Yi-Jing Hao, Liqiang Nie, Heyan Huang, Guihua Wen
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve. |
| Researcher Affiliation | Academia | Department of Computer Science, Beijing Institute of Technology, China Department of Computing, National University of Singapore, Singapore Department of Computer Science and Technology, South China University of Technology, China {maoxl, 2120160986, 2220150504, hhy63}@bit.edu.cn nieliqiang@gmail.com, crghwen@scut.edu.cn |
| Pseudocode | No | The paper includes mathematical formulations and derivations but does not present pseudocode or an algorithm block. |
| Open Source Code | Yes | We have released our codes to facilitate other researchers to repeat our experiments and validate their own ideas 1. https://www.dropbox.com/s/2yral5h23lwzipp/src.zip?dl=0 |
| Open Datasets | Yes | Six publicly available image datasets, namely CIFAR10, CIFAR100-20, CIFAR100-100, Local-Patch, MNIST and COVTYPE, and two crawled text datasets are used to compare the proposed approach against state-of-the-art methods. CIFAR102 dataset consists of 60K 32x32 colour images in 10 classes. CIFAR-100 is just like the CIFAR-10, except that it has 20 coarse and 100 fine superclasses, denoted as CIFAR100-202 and CIFAR1001002. Local-Patch3 contains roughly 300K 32x32 image patches. MNIST4 consists of a total of 70000 handwritten digit samples. COVTYPE5 is a common benchmark featuring 54 dimensions. |
| Dataset Splits | Yes | All the experimental results are averaged over 10 random training/test partitions. For each partition, we randomly select 100 points with their tags as queries, and the remaining points and tags as reference database. |
| Hardware Specification | Yes | All experiments are conducted on our workstation with Intel(R) Xeon(R) CPU X7560@2.27GHz and 32G memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | Figure 2 shows the effect of the partition interval W in S2JSD-LSH hash functions (Eq.(11)) at different code size on the CIFAR100-100 and MNIST. As we can see, the trend of m AP values decreases when W changes from 0.1 to 1.0, and our method can achieve the best accuracy synthetically when W = 0.2 on both datasets. Similar trends have been observed over other datasets. In the following experiments, we set parameter W = 0.2 for S2JSD-LSH. |