Fast Similarity Search via Optimal Sparse Lifting
Authors: Wenye Li, Jingwei Mao, Yin Zhang, Shuguang Cui
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of the proposed work, we carried out a series of experiments. It was found that, for given data, our approach could produce the optimal sparse lifting and the sparse lifting operator with high quality. It reported consistently and significantly improved precision and speed in similarity search applications on various datasets. And hence our work provides a solution for practical applications. |
| Researcher Affiliation | Academia | 1 The Chinese University of Hong Kong, Shenzhen, China 2 Shenzhen Research Institute of Big Data, Shenzhen, China {wyli,yinzhang,shuguangcui}@cuhk.edu.cn, 216019005@link.cuhk.edu.cn |
| Pseudocode | Yes | Algorithm 1 min Y XT X Y T Y 2 F s.t. Y Y {0, 1}d m |
| Open Source Code | No | The paper does not provide any explicit statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | SIFT: SIFT descriptors of images used for similarity search (d = 128) [14]. GLOVE: Pre-trained word vectors based on the Glo Ve algorithm (d = 300) [25]. MNIST: 28 28 images of handwritten digits in 256-grayscale pixels (d = 784) [18]. |
| Dataset Splits | No | The paper specifies training and testing sets (e.g., 'a subset of 10,000 samples from each dataset were used as the testing set' and 'we randomly selected 5,000 different samples from each dataset as the training set'), but it does not mention a separate validation set or provide details on how the data was split for validation. |
| Hardware Specification | No | The paper mentions running experiments on 'only one CPU core' but does not specify the model or type of CPU, nor any other hardware components like GPU or memory. |
| Software Dependencies | No | The paper states 'Our approach used IBM ILOG CPLEX Optimizer as the underlying linear program solver.' but does not specify a version number for CPLEX or any other software dependencies. |
| Experiment Setup | Yes | Specifically, we had an experiment to illustrate the effectiveness of the optimal sparse lifting (ref. Section 4.2), an experiment with the same scenario of similarity search as in [8] to demonstrate the empirical superiority of the proposed optimal lifting operator (ref. Section 4.3), and an experiment to show the running speed comparison in a query application (ref. Section 4.4). For our proposed approach, we randomly selected 5, 000 different samples from each dataset as the training set. For the fly and the optimal lifting algorithms, the output dimensions are set to d = 20 k and d = 2, 000 respectively. The horizontal axis shows different hash lengths (k = 2, 4, 8, 16, 32 respectively). |