reproducibilityindex.ai

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Authors: Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and conﬁrm that variance unimodality occurs robustly for all models we considered. ... We corroborate these empirical results with a theoretical analysis of two-layer linear networks...
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. 2Department of Statistics, University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Estimating Generalized Variance
Open Source Code	Yes	Our code can be found at https://github.com/yaodongyu/ Rethink-Bias Variance-Tradeoff.
Open Datasets	Yes	We trained a Res Net34 (He et al., 2016) on the CIFAR10 dataset (Krizhevsky et al., 2009). ... In addition to CIFAR10, we study bias and variance on MNIST (Le Cun, 1998) and Fashion-MNIST (Xiao et al., 2017).
Dataset Splits	No	The paper describes its training and test sets but does not explicitly mention a separate 'validation' set or specific split percentages for validation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions optimizers (SGD) and loss functions (squared error, cross-entropy) but does not specify any software library names with version numbers (e.g., TensorFlow, PyTorch, scikit-learn) required for replication.
Experiment Setup	Yes	We trained using stochastic gradient descent (SGD) with momentum 0.9. The initial learning rate is 0.1. We applied stage-wise training (decay learning rate by a factor of 10 every 200 epochs), and used weight decay 5 x 10^-4.