Scaling MLPs: A Tale of Inductive Bias

Authors: Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the performance of MLPs drastically improves with scale (95% on CIFAR10, 82% on CIFAR100, 58% on Image Net Rea L), highlighting that lack of inductive bias can indeed be compensated.
Researcher Affiliation Academia Gregor Bachmann , Sotiris Anagnostidis , Thomas Hofmann ETH Zürich, Switzerland
Pseudocode Yes D Inverted Bottleneck MLP Code: We provide PyTorch-style pseudo-code for the inverted bottleneck MLP to highlight its simplicity.
Open Source Code Yes Code and checkpoints available at https://github.com/gregorbachmann/scaling_mlps
Open Datasets Yes We study the popular tasks CIFAR10, CIFAR100 (Krizhevsky, 2009), STL10 (Coates et al., 2011), Tiny Image Net (Le and Yang, 2015), Image Net1k for evaluation, as well as Image Net21k (Deng et al., 2009) for pre-training.
Dataset Splits No The paper mentions pre-training and fine-tuning on various datasets, and evaluates test error, but does not explicitly specify the splitting methodology or proportions for training, validation, and test sets. It does not use the term 'validation' in the context of dataset splits for reproduction.
Hardware Specification Yes All of our experiments were conducted on a single NVIDIA RTX A5000 GPU with 24GB of memory.
Software Dependencies No The paper mentions using PyTorch, SciPy library, FFCV framework, and the LION optimizer, but it does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes All models were trained with the LION optimizer (Chen et al., 2023) with a learning rate η = 5e-5. In order to combat overfitting we use strong label smoothing α = 0.3. We center and normalize all the images and use random flips and crops as well as Mix Up (Zhang et al., 2018) as data augmentations.