Fast Similarity Search via Optimal Sparse Lifting

Authors: Wenye Li, Jingwei Mao, Yin Zhang, Shuguang Cui

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of the proposed work, we carried out a series of experiments. It was found that, for given data, our approach could produce the optimal sparse lifting and the sparse lifting operator with high quality. It reported consistently and significantly improved precision and speed in similarity search applications on various datasets. And hence our work provides a solution for practical applications.
Researcher Affiliation Academia 1 The Chinese University of Hong Kong, Shenzhen, China 2 Shenzhen Research Institute of Big Data, Shenzhen, China {wyli,yinzhang,shuguangcui}@cuhk.edu.cn, 216019005@link.cuhk.edu.cn
Pseudocode Yes Algorithm 1 min Y XT X Y T Y 2 F s.t. Y Y {0, 1}d m
Open Source Code No The paper does not provide any explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets Yes SIFT: SIFT descriptors of images used for similarity search (d = 128) [14]. GLOVE: Pre-trained word vectors based on the Glo Ve algorithm (d = 300) [25]. MNIST: 28 28 images of handwritten digits in 256-grayscale pixels (d = 784) [18].
Dataset Splits No The paper specifies training and testing sets (e.g., 'a subset of 10,000 samples from each dataset were used as the testing set' and 'we randomly selected 5,000 different samples from each dataset as the training set'), but it does not mention a separate validation set or provide details on how the data was split for validation.
Hardware Specification No The paper mentions running experiments on 'only one CPU core' but does not specify the model or type of CPU, nor any other hardware components like GPU or memory.
Software Dependencies No The paper states 'Our approach used IBM ILOG CPLEX Optimizer as the underlying linear program solver.' but does not specify a version number for CPLEX or any other software dependencies.
Experiment Setup Yes Specifically, we had an experiment to illustrate the effectiveness of the optimal sparse lifting (ref. Section 4.2), an experiment with the same scenario of similarity search as in [8] to demonstrate the empirical superiority of the proposed optimal lifting operator (ref. Section 4.3), and an experiment to show the running speed comparison in a query application (ref. Section 4.4). For our proposed approach, we randomly selected 5, 000 different samples from each dataset as the training set. For the fly and the optimal lifting algorithms, the output dimensions are set to d = 20 k and d = 2, 000 respectively. The horizontal axis shows different hash lengths (k = 2, 4, 8, 16, 32 respectively).