Beyond neural scaling laws: beating power law scaling via data pruning

Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari Morcos

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on Res Nets trained on CIFAR-10, SVHN, and Image Net.
Researcher Affiliation Collaboration Ben Sorscher 1 Robert Geirhos 2 Shashank Shekhar3 Surya Ganguli1,3 Ari S. Morcos3 equal contribution 1Department of Applied Physics, Stanford University 2University of Tübingen 3Meta AI (FAIR)
Pseudocode No The provided text does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes All code for theory plots and numerical perceptron simulations is packaged in a reproducible colab notebook.
Open Datasets Yes We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on Res Nets trained on CIFAR-10, SVHN, and Image Net. ... Vision Transformers fine-tuned on CIFAR-10.
Dataset Splits No The paper refers to 'App. B for pruning and training details' which may contain split information, but this is not provided in the main text. Figure 5B mentions 'top-5 validation accuracy', implying a validation set was used, but its split details are not explicit.
Hardware Specification No The paper mentions that the 'total amount of compute and the type of resources used' are included (as per checklist 3d), but these details are not present in the provided text.
Software Dependencies No The paper does not list specific software dependencies with version numbers in the provided text.
Experiment Setup No The paper mentions models like 'Res Net18' and datasets like 'CIFAR-10', and refers to 'App. B for all pruning/training details', but it does not specify explicit hyperparameters (e.g., learning rate, batch size) or other system-level training settings in the provided text.