reproducibilityindex.ai

Faster Linear Algebra for Distance Matrices

Authors: Piotr Indyk, Sandeep Silwal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an empirical evaluation of our matrix-vector query for the 1 distance function. We chose to implement our 1 upper bound since it s a clean algorithm which possesses many of the same underlying algorithmic ideas as some of our other upper bound results. We envision that similar empirical results hold for most of our upper bounds in Table 1.
Researcher Affiliation	Academia	Piotr Indyk MIT indyk@mit.edu Sandeep Silwal MIT silwal@mit.edu
Pseudocode	Yes	Algorithm 1 Preprocessing; Algorithm 2 matrix-vector Query for p = 1
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is included in the supplementary.
Open Datasets	Yes	The second dataset is the standard MNIST dataset [Le C98] and ﬁnally, our large dataset is Glove word embeddings2 in R50 [PSM14].
Dataset Splits	No	The paper does not explicitly mention specific training, validation, or test dataset splits (e.g., percentages or counts) for its experiments.
Hardware Specification	Yes	Our experiments are done in a 2021 M1 Macbook Pro with 32 gigabytes of RAM.
Software Dependencies	No	We implement all algorithms in Python 3.9 using Numpy with Numba acceleration to speed up all algorithms whenever possible. While Python 3.9 is specified, explicit version numbers for Numpy and Numba are not provided.
Experiment Setup	Yes	We chose two real and one synthetic dataset for our experiments. We have two small" datasets and one large" dataset. The two small datasets have 5 104 points whereas the large dataset has approximately 106 points. The first dataset is points drawn from a mixture of three spherical Gaussians in R50. The second dataset is the standard MNIST dataset [Le C98] and ﬁnally, our large dataset is Glove word embeddings2 in R50 [PSM14]. The naive algorithm for the small datasets is the following: we initialize the full distance matrix (which will count towards preprocessing), and then we use the full distance matrix to perform a matrix-vector query. Query times are averaged over 10 trials with Gaussian vectors as queries.