reproducibilityindex.ai

Subquadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Authors: Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical evaluations for our algorithms. We chose to evaluate algorithms for low-rank approximation (LRA) and spectral sparsification (and spectral clustering as a corollary) as they are arguably two of the most well studied examples in our applications and utilize a wide variety of techniques present in our other examples of Sections D and E. Our evaluations serve as a proof of concept that our queries which we constructed are efficient and easy to implement in practice.
Researcher Affiliation	Academia	Ainesh Bakshi MIT ainesh@mit.edu Piotr Indyk MIT indyk@mit.edu Praneeth Kacham CMU pkacham@cs.cmu.edu Sandeep Silwal MIT silwal@mit.edu Samson Zhou UC Berkeley and Rice University samsonzhou@gmail.com
Pseudocode	Yes	The paper includes several algorithm blocks, for example: 'Algorithm 1 Multi-level KDE Construction', 'Algorithm 8 Spectral Sparsification of the Kernel Graph', and 'Algorithm 9 Additive-error Low-rank Approximation'.
Open Source Code	No	The paper refers to an accessible implementation of a KDE kernel from a prior work: 'We chose to work with the implementation of (Backurs et al., 2019) since it possesses theoretical guarantees, has an accessible implementation1, and has been used in experiments in prior works such as (Backurs et al., 2019; 2021).' The footnote '1' links to 'https://github.com/talwagner/efficient_kde'. However, this is not a statement about the authors' own code for the methods described in this paper being open-sourced.
Open Datasets	Yes	We use two real and two synthetic datasets in our experiments. The datasets used in the low-rank approximation experiments are MNIST (points in R784) (Le Cun, 1998) and Glove word embeddings (points in R200) (Pennington et al., 2014).
Dataset Splits	No	The paper mentions using '104 points from each of the test datasets' for MNIST and Glove, and refers to 'test datasets', but does not explicitly provide details about train/validation/test splits or how data was partitioned for training and validation.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper states: 'All linear algebra subroutines rely on Numpy, Scipy, and Numba implementations when applicable.' However, it does not specify version numbers for these software components.
Experiment Setup	No	The paper states: 'For low-rank approximation datasets, we choose the bandwidth value σ according to the choice made in prior experiments, in particular the values given in (Backurs et al., 2019). There, σ is chosen according to the popular median distance rule' and 'Concretely we sample 25k rows for a rank k approximation which we fix it for all experiments.' However, it does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings for training.