Supervised Kernel Thinning

Authors: Albert Gong, Kyuseong Choi, Raaz Dwivedi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our design choices with both simulations and real data experiments.
Researcher Affiliation Academia Albert Gong Kyuseong Choi Raaz Dwivedi Cornell Tech, Cornell University agong,kc728,dwivedi@cornell.edu
Pseudocode Yes Algorithm 1: KT-COMPRESS++ Identify coreset of size n... Algorithm 3b: KT-SWAP Identify and refine the best candidate coreset
Open Source Code Yes Our code can be found at https://github.com/ag2435/npr.
Open Datasets Yes California Housing regression dataset from Pace and Barry [17] (https://scikit-learn.org/1.5/datasets/ real_world.html#california-housing-dataset; BSD-3-Clause license) and the SUSY binary classification dataset from Baldi et al. [2] (https://archive.ics.uci.edu/dataset/ 279/susy; CC-BY-4.0 license).
Dataset Splits Yes Specifically, we use a held-out validation set of size 104 and run each parameter configuration 100 times to estimate the validation MSE since KT-KRR and ST-KRR are random.
Hardware Specification Yes All our experiments were run on a machine with 8 CPU cores and 100 GB RAM.
Software Dependencies No The paper mentions 'Matlab implementation' and 'Cython implementation' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We select the bandwidth h and regularization parameter λ (for KRR) using grid search. For all methods, we use the Gaussian kernel (23) with bandwidth h = 10. We use λ = λ = 10 3 for FULL-KRR, ST-KRR, and KT-KRR and λ = 10 5 for RPCHOLESKY-KRR. All parameters are chosen with cross-validation.