The Pitfalls of Simplicity Bias in Neural Networks

Authors: Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, Praneeth Netrapalli

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through theoretical analysis and targeted experiments on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features. (ii) The extreme aspect of SB could explain why seemingly benign distribution shifts and small adversarial perturbations significantly degrade model performance. (iii) Contrary to conventional wisdom, SB can also hurt generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complex features. (iv) Common approaches to improve generalization and robustness ensembles and adversarial training can fail in mitigating SB and its pitfalls. Given the role of SB in training neural networks, we hope that the proposed datasets and methods serve as an effective testbed to evaluate novel algorithmic approaches aimed at avoiding the pitfalls of SB.
Researcher Affiliation Collaboration Harshay Shah Microsoft Research harshay.rshah@gmail.com Kaustav Tamuly Microsoft Research ktamuly2@gmail.com Aditi Raghunathan Stanford University aditir@stanford.edu Prateek Jain Microsoft Research prajain@microsoft.com Praneeth Netrapalli Microsoft Research praneeth@microsoft.com
Pseudocode No No. The paper describes methods verbally and uses equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Note that all code and datasets are available at the following repository: https://github.com/harshays/simplicitybiaspitfalls.
Open Datasets Yes MNIST-CIFAR Data: The MNIST-CIFAR dataset consists of two classes: images in class 1 and class 1 are vertical concatenations of MNIST digit zero & CIFAR-10 automobile and MNIST digit one & CIFAR-10 truck images respectively, as shown in Figure 2.
Dataset Splits Yes The training and test datasets comprise 50,000 and 10,000 images of size 3 64 32. ... SGD-trained FCNs that are selected based on validation accuracy after performing a grid search over four SGD hyperparameters: learning rate, batch size, momentum, and weight decay. The train, test and randomized accuracies in Table 2 collectively show that FCNs exclusively rely on the noisy linear feature and consequently attain 5% generalization error.
Hardware Specification No No. The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No No. The paper mentions various models and optimizers (e.g., SGD, MobileNetV2, ResNet50) but does not specify any software libraries or dependencies with their version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup No No. While the paper mentions that a grid search was performed over hyperparameters like 'learning rate, batch size, momentum, and weight decay' and discusses model architectures and optimizers, it does not explicitly provide the specific concrete values for these hyperparameters used to obtain the results in the main tables or figures.