Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

Authors: Jing Xu, Jiaye Teng, Yang Yuan, Andrew Yao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we conduct the various experiments to verify the theoretical results and demonstrate the benefits of early stopping. In this section, we provide numerical studies of overparameterized linear regression problems.
Researcher Affiliation Academia Jing Xu IIIS, Tsinghua University xujing21@mails.tsinghua.edu.cn Jiaye Teng IIIS, Tsinghua University tjy20@mails.tsinghua.edu.cn Yang Yuan IIIS, Tsinghua University Shanghai Artificial Intelligence Laboratory Shanghai Qi Zhi Institute yuanyang@tsinghua.edu.cn Andrew Chi-Chih Yao IIIS, Tsinghua University Shanghai Artificial Intelligence Laboratory Shanghai Qi Zhi Institute andrewcyao@tsinghua.edu.cn
Pseudocode No The paper provides mathematical formulas for algorithms, such as the update rule for gradient descent: 'θt+1 = θt - λ/n X (Xθt - Y )', but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide any statements about releasing open-source code or links to a code repository for the methodology described.
Open Datasets Yes To gain insight into the interplay between data and algorithms, we provide motivating examples of a synthetic overparameterized linear regression task and a classification task on the corrupted MNIST dataset in figure 1. The MNIST experiment details are described below. We create a noisy version of MNIST with label noise rate 20%, i.e. randomly perturbing the label with probability 20% for each training data, to simulate the label noise which is common in real datasets, e.g Image Net [60, 65, 73].
Dataset Splits No The paper states 'one can pick up the best parameter on the training trajectory, by calculating its loss on a validation dataset', indicating the concept of a validation set, but it does not specify any details about the size or proportion of this validation set within the dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or other computational resources.
Software Dependencies No The paper mentions software components like 'vanilla SGD optimizer without momentum or weight decay' and 'standard cross entropy loss as the loss function', but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes We use a vanilla SGD optimizer without momentum or weight decay. The initial learning rate is set to 0.5. Learning rate is decayed by 0.98 every epoch. Each model is trained for 300 epochs. The training batch size is set to 1024, and the test batch size is set to 1000. We choose the standard cross entropy loss as the loss function.