Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression

Authors: Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions. In Figure 3, our experiment demonstrates the GEP, as the learning curves with kernel features (Sine features) and independent features (Gaussian features z N(0, Ip) or Rademacher features z (unif{ 1})p) coincide and match the theoretical decay.
Researcher Affiliation Academia Tin Sum Cheng, Aurelien Lucchi Department of Mathematics and Computer Science University of Basel, Switzerland EMAIL, EMAIL Anastasis Kratsios Department of Mathematics Mc Master University and The Vector Institute Ontario, Canada EMAIL David Belius Faculty of Mathematics and Computer Science Uni Distance Suisse Switzerland EMAIL
Pseudocode No The paper provides a 'Proof sketch' in Section 4 and a 'flowchart in Figure 4' outlining proof techniques, but it does not include any pseudocode or algorithm blocks.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code for the experiments is uploaded as supplementary materials.
Open Datasets No The experiments are based on synthetically generated data using specified parameters (e.g., 'ยต = unif[0, 1]'), not a publicly accessible dataset with explicit access information (link, DOI, formal citation).
Dataset Splits No The paper mentions a 'sample size n ranges from 100 to 1000' and discusses 'test error', but it does not specify explicit train/validation/test dataset splits (percentages, counts, or predefined splits) for its synthetic data.
Hardware Specification Yes All experiments were conducted on a computer with a 2.3 GHz Quad-Core Intel Core i7 processor.
Software Dependencies No The paper does not provide specific software dependencies or library versions (e.g., 'PyTorch 1.9' or 'NumPy 1.20') used for the experiments.
Experiment Setup Yes In the following experiment, we choose p = 2000, and the sample size n ranges from 100 to 1000, with ridge parameter ฮป = ( 2n 1 2 ฯ€) b where b [0, 1 + a].