reproducibilityindex.ai

Norm-Ranging LSH for Maximum Inner Product Search

Authors: Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall, thus signiﬁcantly beneﬁting MIPS based applications.
Researcher Affiliation	Academia	Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng Department of Computer Science The Chinese University of Hong Kong Shatin, Hong Kong {xyan, jfli, xydai, hzchen, jcheng}@cse.cuhk.edu.hk
Pseudocode	Yes	The index building and query processing procedures of RANGE-LSH are presented in Algorithm 1 and Algorithm 2, respectively.
Open Source Code	Yes	Experiment codes https://github.com/xinyandai/similarity-search/tree/mipsex.
Open Datasets	Yes	We used three popular datasets, i.e., Netﬂix, Yahoo!Music and Image Net, in the experiments. For the Netﬂix dataset and Yahoo!Music dataset, the user and item embeddings were obtained using alternating least square based matrix factorization [Yun et al., 2013]... The Image Net dataset contains more than 2 million SIFT descriptors of the Image Net images...
Dataset Splits	No	The paper uses datasets like Netﬂix, Yahoo!Music, and Image Net, and samples queries, but does not provide specific train/validation/test splits (e.g., percentages or counts) or reference standard predefined splits for reproducibility beyond stating '1000 randomly selected queries' and 'used the rest as dataset items'.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using LSH functions and algorithms but does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For L2-ALSH, we used the parameter setting recommended by its authors, i.e., m = 3, U = 0.83, r = 2.5. For RANGE-LSH... We partitioned the dataset into 32, 64 and 128 sub-datasets under a code length of 16, 32 and 64, respectively.