Faster Linear Algebra for Distance Matrices
Authors: Piotr Indyk, Sandeep Silwal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an empirical evaluation of our matrix-vector query for the 1 distance function. We chose to implement our 1 upper bound since it s a clean algorithm which possesses many of the same underlying algorithmic ideas as some of our other upper bound results. We envision that similar empirical results hold for most of our upper bounds in Table 1. |
| Researcher Affiliation | Academia | Piotr Indyk MIT indyk@mit.edu Sandeep Silwal MIT silwal@mit.edu |
| Pseudocode | Yes | Algorithm 1 Preprocessing; Algorithm 2 matrix-vector Query for p = 1 |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is included in the supplementary. |
| Open Datasets | Yes | The second dataset is the standard MNIST dataset [Le C98] and finally, our large dataset is Glove word embeddings2 in R50 [PSM14]. |
| Dataset Splits | No | The paper does not explicitly mention specific training, validation, or test dataset splits (e.g., percentages or counts) for its experiments. |
| Hardware Specification | Yes | Our experiments are done in a 2021 M1 Macbook Pro with 32 gigabytes of RAM. |
| Software Dependencies | No | We implement all algorithms in Python 3.9 using Numpy with Numba acceleration to speed up all algorithms whenever possible. While Python 3.9 is specified, explicit version numbers for Numpy and Numba are not provided. |
| Experiment Setup | Yes | We chose two real and one synthetic dataset for our experiments. We have two small" datasets and one large" dataset. The two small datasets have 5 104 points whereas the large dataset has approximately 106 points. The first dataset is points drawn from a mixture of three spherical Gaussians in R50. The second dataset is the standard MNIST dataset [Le C98] and finally, our large dataset is Glove word embeddings2 in R50 [PSM14]. The naive algorithm for the small datasets is the following: we initialize the full distance matrix (which will count towards preprocessing), and then we use the full distance matrix to perform a matrix-vector query. Query times are averaged over 10 trials with Gaussian vectors as queries. |