Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck
Authors: Benjamin Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate the theoretical analysis with a systematic empirical study of offline sparse parity learning using SGD on MLPs, demonstrating some of the (perhaps) counterintuitive effects of width, data, and initialization. We launch a large-scale ( 200K GPU training runs) exploration of resource tradeoffs when training neural networks to solve the offline sparse parity problem. |
| Researcher Affiliation | Collaboration | 1Harvard University 2University of Pennsylvania 3Hebrew University of Jerusalem 4Microsoft Research NYC |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | To this end, we use the benchmark assembled by Grinsztajn et al. (2022), a work which specifically investigates the performance gap between neural networks and tree-based classifiers (e.g. random forests, gradient-boosted trees), and includes a standardized suite of 16 classification benchmarks with numerical input features. |
| Dataset Splits | No | The paper mentions 'Example train & test error curves' and 'subsampling varying fractions of each dataset for training' but does not provide specific dataset split percentages, sample counts for each split, or detailed methodology for train/validation/test splits. |
| Hardware Specification | No | The paper mentions '200K GPU training runs' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and Scikit-learn in its references but does not provide specific version numbers for these or other key software components used in their experiments. |
| Experiment Setup | No | The paper mentions using 'identical hyperparameters' and a specific initialization scheme ('s=2') but does not provide comprehensive details on concrete hyperparameter values such as learning rate, batch size, or optimizer settings needed for full experimental reproduction. |