Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Naive imputation implicitly regularizes high-dimensional linear models
Authors: Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments illustrate our findings. |
| Researcher Affiliation | Academia | 1Sorbonne Universit e, CNRS, Laboratoire de Probabilit es, Statistique et Mod elisation (LPSM), F-75005 Paris, France 2CMAP, UMR7641, Ecole Polytechnique, IP Paris, 91128 Palaiseau, France. |
| Pseudocode | No | Section 4.1 describes the SGD algorithm step-by-step in prose but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any links or explicit statements about releasing open-source code for the described methodology. |
| Open Datasets | No | The paper states 'We generate n = 500 complete input data according to a normal distribution with two different covariance structures.' indicating simulated data, not a publicly available dataset. |
| Dataset Splits | No | The paper describes data simulation and evaluation on test samples but does not specify train/validation/test splits, percentages, or cross-validation methodology. |
| Hardware Specification | No | The paper mentions that regressors are 'implemented in scikit-learn', but it does not specify any hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions 'implemented in scikit-learn (Pedregosa et al., 2011)' but does not provide specific version numbers for scikit-learn or any other software dependencies. |
| Experiment Setup | Yes | Under Assumption 4, choosing a constant learning rate γ = 1 κTr(Σ) n leads to... and with starting point θ0 = 0 and learning rate γ = 1 dκL2 n, satisfies... |