Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
Authors: Jing Xu, Jiaye Teng, Yang Yuan, Andrew Yao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we conduct the various experiments to verify the theoretical results and demonstrate the benefits of early stopping. In this section, we provide numerical studies of overparameterized linear regression problems. |
| Researcher Affiliation | Academia | Jing Xu IIIS, Tsinghua University xujing21@mails.tsinghua.edu.cn Jiaye Teng IIIS, Tsinghua University tjy20@mails.tsinghua.edu.cn Yang Yuan IIIS, Tsinghua University Shanghai Artificial Intelligence Laboratory Shanghai Qi Zhi Institute yuanyang@tsinghua.edu.cn Andrew Chi-Chih Yao IIIS, Tsinghua University Shanghai Artificial Intelligence Laboratory Shanghai Qi Zhi Institute andrewcyao@tsinghua.edu.cn |
| Pseudocode | No | The paper provides mathematical formulas for algorithms, such as the update rule for gradient descent: 'θt+1 = θt - λ/n X (Xθt - Y )', but it does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not provide any statements about releasing open-source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | To gain insight into the interplay between data and algorithms, we provide motivating examples of a synthetic overparameterized linear regression task and a classification task on the corrupted MNIST dataset in figure 1. The MNIST experiment details are described below. We create a noisy version of MNIST with label noise rate 20%, i.e. randomly perturbing the label with probability 20% for each training data, to simulate the label noise which is common in real datasets, e.g Image Net [60, 65, 73]. |
| Dataset Splits | No | The paper states 'one can pick up the best parameter on the training trajectory, by calculating its loss on a validation dataset', indicating the concept of a validation set, but it does not specify any details about the size or proportion of this validation set within the dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or other computational resources. |
| Software Dependencies | No | The paper mentions software components like 'vanilla SGD optimizer without momentum or weight decay' and 'standard cross entropy loss as the loss function', but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We use a vanilla SGD optimizer without momentum or weight decay. The initial learning rate is set to 0.5. Learning rate is decayed by 0.98 every epoch. Each model is trained for 300 epochs. The training batch size is set to 1024, and the test batch size is set to 1000. We choose the standard cross entropy loss as the loss function. |