Norm-Ranging LSH for Maximum Inner Product Search

Authors: Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall, thus significantly benefiting MIPS based applications.
Researcher Affiliation Academia Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng Department of Computer Science The Chinese University of Hong Kong Shatin, Hong Kong {xyan, jfli, xydai, hzchen, jcheng}@cse.cuhk.edu.hk
Pseudocode Yes The index building and query processing procedures of RANGE-LSH are presented in Algorithm 1 and Algorithm 2, respectively.
Open Source Code Yes Experiment codes https://github.com/xinyandai/similarity-search/tree/mipsex.
Open Datasets Yes We used three popular datasets, i.e., Netflix, Yahoo!Music and Image Net, in the experiments. For the Netflix dataset and Yahoo!Music dataset, the user and item embeddings were obtained using alternating least square based matrix factorization [Yun et al., 2013]... The Image Net dataset contains more than 2 million SIFT descriptors of the Image Net images...
Dataset Splits No The paper uses datasets like Netflix, Yahoo!Music, and Image Net, and samples queries, but does not provide specific train/validation/test splits (e.g., percentages or counts) or reference standard predefined splits for reproducibility beyond stating '1000 randomly selected queries' and 'used the rest as dataset items'.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using LSH functions and algorithms but does not specify any software dependencies with version numbers.
Experiment Setup Yes For L2-ALSH, we used the parameter setting recommended by its authors, i.e., m = 3, U = 0.83, r = 2.5. For RANGE-LSH... We partitioned the dataset into 32, 64 and 128 sub-datasets under a code length of 16, 32 and 64, respectively.