Implicit Sparse Regularization: The Impact of Depth and Early Stopping

Authors: Jiangyuan Li, Thanh Nguyen, Chinmay Hegde, Ka Wai Wong

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of simulation experiments1 to further illuminate our theoretical findings. Our simulation setup is described as follows. The entries of X are sampled as i.i.d. Rademacher random variables and the entries of the noise vector are i.i.d. N(0, σ2) random variables. We let w? = γ1S. The values for the simulation parameters are: n = 500, p = 3000, k = 5, γ = 1, σ = 0.5 unless otherwise specified. For 2-plots each simulation is repeated 30 times, and the median 2 error is depicted. The shaded area indicates the region between 25th and 75th percentiles pointwisely.
Researcher Affiliation Academia Jiangyuan Li jiangyuanli@tamu.edu Texas A&M University Thanh V. Nguyen thanhng.cs@gmail.com Chinmay Hegde chinmay.h@nyu.edu New York University Raymond K. W. Wong raywong@tamu.edu Texas A&M University
Pseudocode No The paper describes the gradient descent update rule in mathematical form (equation 2) but does not present it as a formal pseudocode block or algorithm.
Open Source Code Yes The code is available on https://github.com/jiangyuan2li/Implicit-Sparse-Regularization.
Open Datasets Yes The entries of X are sampled as i.i.d. Rademacher random variables and the entries of the noise vector are i.i.d. N(0, σ2) random variables. We let w? = γ1S. ... and we defer the result on MNIST to Appendix E.
Dataset Splits No The paper provides simulation parameters (n, p, k, γ, σ) and mentions that 'each simulation is repeated 30 times,' but it does not specify explicit training, validation, or test splits for the simulated data or for the MNIST dataset mentioned.
Hardware Specification No The paper does not specify any hardware used for running the experiments (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies No The paper mentions that "The code is available on https://github.com/jiangyuan2li/Implicit-Sparse-Regularization." However, it does not specify any software dependencies with version numbers in the text (e.g., Python version, specific libraries like PyTorch or TensorFlow versions).
Experiment Setup Yes The values for the simulation parameters are: n = 500, p = 3000, k = 5, γ = 1, σ = 0.5 unless otherwise specified. For 2-plots each simulation is repeated 30 times, and the median 2 error is depicted. The shaded area indicates the region between 25th and 75th percentiles pointwisely. ... We choose different values of N to illustrate the convergence of the algorithm. ... We intentionally pick a relatively large N = 2 × 10−3 where the algorithm fails to converge for N = 2. With the same initialization, the recovery manifests as N increases (Figure 3). ... Note that for both Figures 1 and 4, we set n = 100 and p = 200. Since N would decrease quickly with N, which would cause the algorithm takes a large number of iterations to escape from the small region. We fix N = 10−5 instead of fixing for Figure 4. ... The initialization is N = 10−4 and the step size is = 10−3 for all N.