Simplicity Bias in 1-Hidden Layer Neural Networks
Authors: Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that models trained on real datasets such as Imagenet and Waterbirds Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets |
| Researcher Affiliation | Collaboration | Depen Morwani Department of Computer Science Harvard University dmorwani@g.harvard.edu Jatin Batra School of Technology and Computer Science Tata Institute of Fundamental Research (TIFR) jatin.batra@tifr.res.in Prateek Jain1 Google Research prajain@google.com Praneeth Netrapalli1 Google Research pnetrapalli@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the authors' source code is open-source or publicly available. |
| Open Datasets | Yes | Empirically, we demonstrate LD-SB on three real world datasets: binary and multiclass version of Imagenette (Fast AI, 2021), waterbirds-landbirds (Sagawa et al., 2020a) as well as the Image Net (Deng et al., 2009) dataset. |
| Dataset Splits | Yes | For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. It mentions using "Imagenet pretrained Resnet-50 models" but not the hardware for training or inference. |
| Software Dependencies | No | The paper mentions general software components like "SGD" and using a "Resnet-50" model, but does not specify any programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Every model is trained for 20000 (100000 for Imagenet) steps with a warmup and cosine decay learning rate scheduler. For each of the runs, we tune the batch size, learning rate and weight decay using validation accuracy. Below are the hyperparameter tuning details: Batch size {128, 256} Learning rate: Rich regime: {0.5, 1.0} (for imagenet, {5.0, 10.0} as learning rate in rich regime needs to scale up with the hidden dimension) Lazy regime: {0.01, 0.05} Weight decay: {0, 1e 4} |