Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
Authors: Marko Medvedev, Gal Vardi, Nati Srebro
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A Empirical Justification. We plot the dependence of the test error of the Gaussian kernel ridgeless predictor... We ran the experiments on one A6000 GPU. |
| Researcher Affiliation | Academia | Marko Medvedev The University of Chicago EMAIL Gal Vardi Weizmann Institute of Science EMAIL Nathan Srebro TTI-Chicago EMAIL |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The code to reproduce these experiments can be found at https://github.com/marko-medvedev/overfitting-behavior-of-gaussian-kernel-ridgeless-regression. |
| Open Datasets | No | The paper uses synthetic data generated as 'y = f (x) + ξ where ξ N(0, σ2), f = 10, σ2 is the noise level, and x Unif(Sd 1)', which is not a publicly available or open dataset. |
| Dataset Splits | No | The paper describes experiments but does not provide specific training/test/validation dataset splits. It mentions running '100 different runs of the experiment' but no explicit data partitioning. |
| Hardware Specification | Yes | We ran the experiments on one A6000 GPU. |
| Software Dependencies | No | The paper provides a link to the code for reproduction but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Specifically, we consider y = f (x) + ξ where ξ N(0, σ2), f = 10, σ2 is the noise level, and x Unif(Sd 1). We vary the values of d and σ2 and the bandwidth scaling τm as follows: for τm = o(m 1 d 1 ) we take σ2 = 1 and d = 6, for τm = ω(m 1 d 1 ) we take σ2 = 10 and d = 4, and for τm = Θ(m 1 d 1 ) we take σ2 = 10000 and d = 6. |