Faster Linear Algebra for Distance Matrices

Authors: Piotr Indyk, Sandeep Silwal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an empirical evaluation of our matrix-vector query for the 1 distance function. We chose to implement our 1 upper bound since it s a clean algorithm which possesses many of the same underlying algorithmic ideas as some of our other upper bound results. We envision that similar empirical results hold for most of our upper bounds in Table 1.
Researcher Affiliation Academia Piotr Indyk MIT indyk@mit.edu Sandeep Silwal MIT silwal@mit.edu
Pseudocode Yes Algorithm 1 Preprocessing; Algorithm 2 matrix-vector Query for p = 1
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is included in the supplementary.
Open Datasets Yes The second dataset is the standard MNIST dataset [Le C98] and finally, our large dataset is Glove word embeddings2 in R50 [PSM14].
Dataset Splits No The paper does not explicitly mention specific training, validation, or test dataset splits (e.g., percentages or counts) for its experiments.
Hardware Specification Yes Our experiments are done in a 2021 M1 Macbook Pro with 32 gigabytes of RAM.
Software Dependencies No We implement all algorithms in Python 3.9 using Numpy with Numba acceleration to speed up all algorithms whenever possible. While Python 3.9 is specified, explicit version numbers for Numpy and Numba are not provided.
Experiment Setup Yes We chose two real and one synthetic dataset for our experiments. We have two small" datasets and one large" dataset. The two small datasets have 5 104 points whereas the large dataset has approximately 106 points. The first dataset is points drawn from a mixture of three spherical Gaussians in R50. The second dataset is the standard MNIST dataset [Le C98] and finally, our large dataset is Glove word embeddings2 in R50 [PSM14]. The naive algorithm for the small datasets is the following: we initialize the full distance matrix (which will count towards preprocessing), and then we use the full distance matrix to perform a matrix-vector query. Query times are averaged over 10 trials with Gaussian vectors as queries.