Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Supervised Kernel Thinning
Authors: Albert Gong, Kyuseong Choi, Raaz Dwivedi
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our design choices with both simulations and real data experiments. |
| Researcher Affiliation | Academia | Albert Gong Kyuseong Choi Raaz Dwivedi Cornell Tech, Cornell University agong,kc728,EMAIL |
| Pseudocode | Yes | Algorithm 1: KT-COMPRESS++ Identify coreset of size n... Algorithm 3b: KT-SWAP Identify and refine the best candidate coreset |
| Open Source Code | Yes | Our code can be found at https://github.com/ag2435/npr. |
| Open Datasets | Yes | California Housing regression dataset from Pace and Barry [17] (https://scikit-learn.org/1.5/datasets/ real_world.html#california-housing-dataset; BSD-3-Clause license) and the SUSY binary classification dataset from Baldi et al. [2] (https://archive.ics.uci.edu/dataset/ 279/susy; CC-BY-4.0 license). |
| Dataset Splits | Yes | Specifically, we use a held-out validation set of size 104 and run each parameter configuration 100 times to estimate the validation MSE since KT-KRR and ST-KRR are random. |
| Hardware Specification | Yes | All our experiments were run on a machine with 8 CPU cores and 100 GB RAM. |
| Software Dependencies | No | The paper mentions 'Matlab implementation' and 'Cython implementation' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We select the bandwidth h and regularization parameter λ (for KRR) using grid search. For all methods, we use the Gaussian kernel (23) with bandwidth h = 10. We use λ = λ = 10 3 for FULL-KRR, ST-KRR, and KT-KRR and λ = 10 5 for RPCHOLESKY-KRR. All parameters are chosen with cross-validation. |