Norm-Ranging LSH for Maximum Inner Product Search
Authors: Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall, thus significantly benefiting MIPS based applications. |
| Researcher Affiliation | Academia | Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng Department of Computer Science The Chinese University of Hong Kong Shatin, Hong Kong {xyan, jfli, xydai, hzchen, jcheng}@cse.cuhk.edu.hk |
| Pseudocode | Yes | The index building and query processing procedures of RANGE-LSH are presented in Algorithm 1 and Algorithm 2, respectively. |
| Open Source Code | Yes | Experiment codes https://github.com/xinyandai/similarity-search/tree/mipsex. |
| Open Datasets | Yes | We used three popular datasets, i.e., Netflix, Yahoo!Music and Image Net, in the experiments. For the Netflix dataset and Yahoo!Music dataset, the user and item embeddings were obtained using alternating least square based matrix factorization [Yun et al., 2013]... The Image Net dataset contains more than 2 million SIFT descriptors of the Image Net images... |
| Dataset Splits | No | The paper uses datasets like Netflix, Yahoo!Music, and Image Net, and samples queries, but does not provide specific train/validation/test splits (e.g., percentages or counts) or reference standard predefined splits for reproducibility beyond stating '1000 randomly selected queries' and 'used the rest as dataset items'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using LSH functions and algorithms but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For L2-ALSH, we used the parameter setting recommended by its authors, i.e., m = 3, U = 0.83, r = 2.5. For RANGE-LSH... We partitioned the dataset into 32, 64 and 128 sub-datasets under a code length of 16, 32 and 64, respectively. |