Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Authors: Boaz Barak, Benjamin Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately n O(k) iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD stumbles in the dark until it finds the hidden set of features (a natural algorithm which also runs in n O(k) time). |
| Researcher Affiliation | Collaboration | Boaz Barak Harvard University Benjamin L. Edelman Harvard University Surbhi Goel Microsoft Research & University of Pennsylvania Sham Kakade Harvard University Eran Malach Hebrew University of Jerusalem Cyril Zhang Microsoft Research |
| Pseudocode | No | No pseudocode or algorithm block found. |
| Open Source Code | No | The paper includes a checklist item stating code is included, but the main body of the paper does not provide an explicit statement or URL for source code availability. |
| Open Datasets | No | The paper uses a synthetic dataset generated from a uniform distribution (Unif({ -1, 1}^n)), but does not provide a link, DOI, or specific citation for public access to a pre-existing dataset. |
| Dataset Splits | No | The paper mentions "validated on a batch of 2^13 samples" and "validation error", but does not specify explicit dataset split percentages, absolute sample counts for splits, or reference to predefined standard splits for reproducibility. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU types, or detailed computer specifications) used for running experiments is mentioned in the paper. |
| Software Dependencies | No | The paper mentions the use of PyTorch in references and Adam optimizer, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | for all combinations of n in {10, 20, 30}, k in {2, 3, 4}, batch sizes B in {1, 2, 4, . . . , 1024}, initializations {uniform, Gaussian, Bernoulli}, loss functions {hinge, square, cross entropy}, and architecture configurations {(i), (ii), . . . , (xv)}, SGD solved the parity problem (with 100% accuracy, validated on a batch of 2^13 samples) in at least 20% of 25 random trials, for at least one choice of learning rate in {0.001, 0.01, 0.1, 1}. |