reproducibilityindex.ai

Anisotropic Additive Quantization for Fast Inner Product Search

Authors: Jin Zhang, Qi Liu, Defu Lian, Zheng Liu, Le Wu, Enhong Chen4354-4362

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed algorithm is extensively evaluated on three real-world datasets. The experimental results show that it outperforms the stateof-the-art baselines with respect to approximate search accuracy while guaranteeing a similar retrieval efﬁciency.
Researcher Affiliation	Collaboration	Jin Zhang1, Qi Liu1, Defu Lian1*, Zheng Liu2, Le Wu3, Enhong Chen1 1 University of Science and Technology of China 2Microsoft Research Asia 3Hefei University of Technology {abczj, qiliu67}@mail.ustc.edu.cn, {liandefu,cheneh}@ustc.edu.cn Zheng.Liu@microsoft.com, lewu.ustc@gmail.com
Pseudocode	No	The paper describes an 'Optimization Procedure' with numbered steps but does not present it as a formal pseudocode block or algorithm labeled as such.
Open Source Code	No	The paper does not provide any explicit statement about releasing its source code or a link to a repository for the described methodology.
Open Datasets	Yes	We use three real-world datasets to evaluate our algorithm. As maximum inner product search is often used in recommender systems, two datasets we use, Last FM and Echo Nest are famous music recommendation datasets. In addition, we also compare our algorithms on Glove1.2M, which is used in Sca NN (Guo et al. 2020) for evaluation. Last FM dataset collected 357, 847 items, 156, 122 users scored on it. Echo Nest dataset collected 260417 items, 766882 users scored on it. We use matrix decomposition to train them into 32-dimensional embedding vectors as described in (Lian et al. 2015), and 10, 000 users are randomly selected as queries. The Glove1.2M is a collection of 1.2 million 100-dimensional word embeddings trained as described in (Pennington, Socher, and Manning 2014).
Dataset Splits	No	The paper mentions that 10,000 users are randomly selected as queries but does not specify the training, validation, and test dataset splits with percentages or sample counts for the main datasets.
Hardware Specification	No	The paper mentions 'a Linux server with 3.00GHZ intel GPU and 300G main memory' but does not provide specific models for the CPU or GPU, only a generic 'intel GPU' and a clock speed, which is not sufficiently specific as per the guidelines.
Software Dependencies	No	The paper states that 'All the quantization methods are implemented in Python' but does not specify the version of Python or any other software dependencies with their version numbers.
Experiment Setup	Yes	In addition to the above descriptions, for the two music dataset, we use 8 codebooks in quantization methods, with anisotropic quantization threshold T set to 0.1 times the mean norm of the datapoints on Echo Nest and 0.05 times the mean norm of the datapoints on Last FM. The Glove dataset is consistent with Sca NN, using 50 codebooks with T set to 0.2. The number of codewords is set to 16 except for Echo Nest, which is set to 128.