Scaling MLPs: A Tale of Inductive Bias
Authors: Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on Image Net Rea L), highlighting that lack of inductive bias can indeed be compensated. |
| Researcher Affiliation | Academia | Gregor Bachmann , Sotiris Anagnostidis , Thomas Hofmann ETH Zürich, Switzerland |
| Pseudocode | Yes | D Inverted Bottleneck MLP Code: We provide PyTorch-style pseudo-code for the inverted bottleneck MLP to highlight its simplicity. |
| Open Source Code | Yes | Code and checkpoints available at https://github.com/gregorbachmann/scaling_mlps |
| Open Datasets | Yes | We study the popular tasks CIFAR10, CIFAR100 (Krizhevsky, 2009), STL10 (Coates et al., 2011), Tiny Image Net (Le and Yang, 2015), Image Net1k for evaluation, as well as Image Net21k (Deng et al., 2009) for pre-training. |
| Dataset Splits | No | The paper mentions pre-training and fine-tuning on various datasets, and evaluates test error, but does not explicitly specify the splitting methodology or proportions for training, validation, and test sets. It does not use the term 'validation' in the context of dataset splits for reproduction. |
| Hardware Specification | Yes | All of our experiments were conducted on a single NVIDIA RTX A5000 GPU with 24GB of memory. |
| Software Dependencies | No | The paper mentions using PyTorch, SciPy library, FFCV framework, and the LION optimizer, but it does not specify version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All models were trained with the LION optimizer (Chen et al., 2023) with a learning rate η = 5e-5. In order to combat overfitting we use strong label smoothing α = 0.3. We center and normalize all the images and use random flips and crops as well as Mix Up (Zhang et al., 2018) as data augmentations. |