Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Authors: Boaz Barak, Benjamin Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately n O(k) iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD stumbles in the dark until it finds the hidden set of features (a natural algorithm which also runs in n O(k) time).
Researcher Affiliation Collaboration Boaz Barak Harvard University Benjamin L. Edelman Harvard University Surbhi Goel Microsoft Research & University of Pennsylvania Sham Kakade Harvard University Eran Malach Hebrew University of Jerusalem Cyril Zhang Microsoft Research
Pseudocode No No pseudocode or algorithm block found.
Open Source Code No The paper includes a checklist item stating code is included, but the main body of the paper does not provide an explicit statement or URL for source code availability.
Open Datasets No The paper uses a synthetic dataset generated from a uniform distribution (Unif({ -1, 1}^n)), but does not provide a link, DOI, or specific citation for public access to a pre-existing dataset.
Dataset Splits No The paper mentions "validated on a batch of 2^13 samples" and "validation error", but does not specify explicit dataset split percentages, absolute sample counts for splits, or reference to predefined standard splits for reproducibility.
Hardware Specification No No specific hardware (e.g., GPU models, CPU types, or detailed computer specifications) used for running experiments is mentioned in the paper.
Software Dependencies No The paper mentions the use of PyTorch in references and Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes for all combinations of n in {10, 20, 30}, k in {2, 3, 4}, batch sizes B in {1, 2, 4, . . . , 1024}, initializations {uniform, Gaussian, Bernoulli}, loss functions {hinge, square, cross entropy}, and architecture configurations {(i), (ii), . . . , (xv)}, SGD solved the parity problem (with 100% accuracy, validated on a batch of 2^13 samples) in at least 20% of 25 random trials, for at least one choice of learning rate in {0.001, 0.01, 0.1, 1}.