On the Double Descent of Random Features Models Trained with SGD

Authors: Fanghui Liu, Johan Suykens, Volkan Cevher

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for interpolation learning, and is able to characterize the double descent behavior by the unimodality of variance and monotonic decrease of bias. Besides, we also prove that the constant step-size SGD setting incurs no loss in convergence rate when compared to the exact minimum-norm interpolator, as a theoretical justification of using SGD in practice. Our empirical evaluations support our theoretical results and findings.
Researcher Affiliation Academia Fanghui Liu LIONS, EPFL fanghui.liu@epfl.ch Johan A.K. Suykens ESAT-STADIUS, KU Leuven johan.suykens@esat.kuleuven.be Volkan Cevher LIONS, EPFL volkan.cevher@epfl.ch
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a URL for accessing the source code for the methodology described in the main body of the paper. While the checklist indicates code availability, no concrete access information is given within the paper itself.
Open Datasets Yes Here we evaluate the test mean square error (MSE) of RFF regression on the MNIST data set [52]
Dataset Splits No The paper mentions 'n = 600 for training' and evaluates 'test mean square error', but does not explicitly provide specific train/validation/test dataset splits or their percentages/counts. There is no mention of a separate validation set.
Hardware Specification No The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Experimental settings: We take digit 3 vs. 7 as an example, and randomly select 300 training data in these two classes, resulting in n = 600 for training. ... The Gaussian kernel k(x, x ) = exp( x x 2 2/(2σ2 0)) is used, where the kernel width σ0 is chosen as σ2 0 = d in high dimensional settings... In our experiment, the initial step-size is set to γ0 = 1 and we take the initial point θ0 near the min-norm solution3 corrupted with zero-mean, unit-variance Gaussian noise. The experiments are repeated 10 times...