Beyond neural scaling laws: beating power law scaling via data pruning
Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari Morcos
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on Res Nets trained on CIFAR-10, SVHN, and Image Net. |
| Researcher Affiliation | Collaboration | Ben Sorscher 1 Robert Geirhos 2 Shashank Shekhar3 Surya Ganguli1,3 Ari S. Morcos3 equal contribution 1Department of Applied Physics, Stanford University 2University of Tübingen 3Meta AI (FAIR) |
| Pseudocode | No | The provided text does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All code for theory plots and numerical perceptron simulations is packaged in a reproducible colab notebook. |
| Open Datasets | Yes | We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on Res Nets trained on CIFAR-10, SVHN, and Image Net. ... Vision Transformers fine-tuned on CIFAR-10. |
| Dataset Splits | No | The paper refers to 'App. B for pruning and training details' which may contain split information, but this is not provided in the main text. Figure 5B mentions 'top-5 validation accuracy', implying a validation set was used, but its split details are not explicit. |
| Hardware Specification | No | The paper mentions that the 'total amount of compute and the type of resources used' are included (as per checklist 3d), but these details are not present in the provided text. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers in the provided text. |
| Experiment Setup | No | The paper mentions models like 'Res Net18' and datasets like 'CIFAR-10', and refers to 'App. B for all pruning/training details', but it does not specify explicit hyperparameters (e.g., learning rate, batch size) or other system-level training settings in the provided text. |