reproducibilityindex.ai

Simplicity Bias in 1-Hidden Layer Neural Networks

Authors: Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that models trained on real datasets such as Imagenet and Waterbirds Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets
Researcher Affiliation	Collaboration	Depen Morwani Department of Computer Science Harvard University dmorwani@g.harvard.edu Jatin Batra School of Technology and Computer Science Tata Institute of Fundamental Research (TIFR) jatin.batra@tifr.res.in Prateek Jain1 Google Research prajain@google.com Praneeth Netrapalli1 Google Research pnetrapalli@google.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the authors' source code is open-source or publicly available.
Open Datasets	Yes	Empirically, we demonstrate LD-SB on three real world datasets: binary and multiclass version of Imagenette (Fast AI, 2021), waterbirds-landbirds (Sagawa et al., 2020a) as well as the Image Net (Deng et al., 2009) dataset.
Dataset Splits	Yes	For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. It mentions using "Imagenet pretrained Resnet-50 models" but not the hardware for training or inference.
Software Dependencies	No	The paper mentions general software components like "SGD" and using a "Resnet-50" model, but does not specify any programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Every model is trained for 20000 (100000 for Imagenet) steps with a warmup and cosine decay learning rate scheduler. For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy. Below are the hyperparameter tuning details: Batch size {128, 256} Learning rate: Rich regime: {0.5, 1.0} (for imagenet, {5.0, 10.0} as learning rate in rich regime needs to scale up with the hidden dimension) Lazy regime: {0.01, 0.05} Weight decay: {0, 1e 4}