Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Authors: Mo Zhou, Rong Ge

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we run synthetic experiments to verify our theoretical results. We choose d from 100 to 10^6 and set n = 3 sqrt(d).
Researcher Affiliation Academia Department of Computer Science, Duke University, Durham, NC, US.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete statement or link regarding the availability of source code for the methodology described.
Open Datasets No The paper states: "data x_i ~ N(0, I) sampled from Gaussian distribution". This indicates synthetic data generation, but no link, DOI, or formal citation for a publicly available dataset is provided.
Dataset Splits No The paper describes synthetic data generation and mentions "training loss" and "test loss" but does not specify explicit training, validation, or test dataset splits (e.g., percentages, counts, or predefined splits).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions "Lasso (implemented in sklearn)" but does not provide specific version numbers for sklearn or any other software dependencies, which is necessary for reproducibility.
Experiment Setup Yes We set lambda = 100d/sigma*n log(n)( sqrt(log(d)/n) + sqrt(n/d)) and run gradient descent with stepsize eta = 10^-6 until training loss reaches 10^-4.