Simplicity Bias in 1-Hidden Layer Neural Networks

Authors: Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that models trained on real datasets such as Imagenet and Waterbirds Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets
Researcher Affiliation Collaboration Depen Morwani Department of Computer Science Harvard University dmorwani@g.harvard.edu Jatin Batra School of Technology and Computer Science Tata Institute of Fundamental Research (TIFR) jatin.batra@tifr.res.in Prateek Jain1 Google Research prajain@google.com Praneeth Netrapalli1 Google Research pnetrapalli@google.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or link indicating that the authors' source code is open-source or publicly available.
Open Datasets Yes Empirically, we demonstrate LD-SB on three real world datasets: binary and multiclass version of Imagenette (Fast AI, 2021), waterbirds-landbirds (Sagawa et al., 2020a) as well as the Image Net (Deng et al., 2009) dataset.
Dataset Splits Yes For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. It mentions using "Imagenet pretrained Resnet-50 models" but not the hardware for training or inference.
Software Dependencies No The paper mentions general software components like "SGD" and using a "Resnet-50" model, but does not specify any programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Every model is trained for 20000 (100000 for Imagenet) steps with a warmup and cosine decay learning rate scheduler. For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy. Below are the hyperparameter tuning details: Batch size {128, 256} Learning rate: Rich regime: {0.5, 1.0} (for imagenet, {5.0, 10.0} as learning rate in rich regime needs to scale up with the hidden dimension) Lazy regime: {0.01, 0.05} Weight decay: {0, 1e 4}