reproducibilityindex.ai

How Data Augmentation affects Optimization for Linear Regression

Authors: Boris Hanin, Yi Sun

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate Theorems 4.1 and 4.2, we ran augmented GD and SGD with additive Gaussian noise on N = 100 simulated datapoints. ... Figure 4.1 shows MSE and Wt, F along a single optimization trajectory with different schedules for the variance σ2 t used in Gaussian noise augmentation.
Researcher Affiliation	Academia	Boris Hanin Department of Operations Research and Financial Engineering Princeton University bhanin@princeton.edu; Yi Sun Department of Statistics University of Chicago yisun@statistics.uchicago.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Complete code to generate this ﬁgure is provided in supplement.zip in the supplement.
Open Datasets	No	The paper states 'N = 100 simulated datapoints' and 'Inputs were i.i.d. Gaussian vectors in dimension n = 400', indicating the data was generated for the experiments rather than being a publicly accessible dataset with concrete access information.
Dataset Splits	No	The paper mentions running experiments on 'simulated datapoints' but does not provide specific details about training, validation, or test dataset splits, percentages, or sample counts.
Hardware Specification	No	The paper mentions 'It ran in 30 minutes on a standard laptop CPU.' This is a general statement and does not provide specific hardware details such as CPU model, GPU models, or memory.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	The learning rate followed a ﬁxed polynomially decaying schedule ηt = 0.005 / (100 * (batch size)) / (1 + t / 20)^0.66, and the batch size used for SGD was 20.