Rejection Sampling for Weighted Jaccard Similarity Revisited
Authors: Xiaoyun Li, Ping Li4197-4205
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted to compare the efficiency of different hashing methods, showing that ERS can be significantly faster in common scenarios when data are not extremely sparse. In particular, ERS substantially accelerates the original RS (Algorithm 1). |
| Researcher Affiliation | Industry | Xiaoyun Li, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA {lixiaoyun996, pingli98}@gmail.com |
| Pseudocode | Yes | Algorithm 1 Rejection Sampling (RS) for Weighted Jaccard; Algorithm 2 Efficient Rejection Sampling (ERS); Algorithm 3 Densification for ERS method |
| Open Source Code | No | The paper states that for competing methods, they "use the same source code as in Christiani (2020)", but it does not provide concrete access to their own ERS implementation. |
| Open Datasets | Yes | We use the Words dataset (Li and Church 2005) and Caltech101 (Li, Fergus, and Perona 2007) (which was also used in Shrivastava (2016)). |
| Dataset Splits | Yes | Both datasets are randomly split 50/50 for training and testing. |
| Hardware Specification | Yes | All the tests are run using C++ on an Intel(R) Xeon(R) Platinum 8276 CPU 2.20GHz server with optimization flag -O3 enabled. |
| Software Dependencies | No | The paper mentions the use of "Mersenne Twister (mt19937) generator" and "XXHash64" but does not specify version numbers for general software dependencies or libraries. |
| Experiment Setup | Yes | The dimensionality is set to D = 216 = 65, 536, and we vary the number of non-zeros d. Each non-zero entry is i.i.d. standard uniform. ... We test L = α 1/s with α = 0.5, 1, 5. ... We randomly generate 100 data vectors and report the average time for generating K = {256, 512, 1024} hash samples. ... Data features are preprocessed to have unit l2 norm. |