reproducibilityindex.ai

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Authors: Boaz Barak, Benjamin Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we ﬁnd that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately n O(k) iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD stumbles in the dark until it ﬁnds the hidden set of features (a natural algorithm which also runs in n O(k) time).
Researcher Affiliation	Collaboration	Boaz Barak Harvard University Benjamin L. Edelman Harvard University Surbhi Goel Microsoft Research & University of Pennsylvania Sham Kakade Harvard University Eran Malach Hebrew University of Jerusalem Cyril Zhang Microsoft Research
Pseudocode	No	No pseudocode or algorithm block found.
Open Source Code	No	The paper includes a checklist item stating code is included, but the main body of the paper does not provide an explicit statement or URL for source code availability.
Open Datasets	No	The paper uses a synthetic dataset generated from a uniform distribution (Unif({ -1, 1}^n)), but does not provide a link, DOI, or specific citation for public access to a pre-existing dataset.
Dataset Splits	No	The paper mentions "validated on a batch of 2^13 samples" and "validation error", but does not specify explicit dataset split percentages, absolute sample counts for splits, or reference to predefined standard splits for reproducibility.
Hardware Specification	No	No specific hardware (e.g., GPU models, CPU types, or detailed computer specifications) used for running experiments is mentioned in the paper.
Software Dependencies	No	The paper mentions the use of PyTorch in references and Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	for all combinations of n in {10, 20, 30}, k in {2, 3, 4}, batch sizes B in {1, 2, 4, . . . , 1024}, initializations {uniform, Gaussian, Bernoulli}, loss functions {hinge, square, cross entropy}, and architecture conﬁgurations {(i), (ii), . . . , (xv)}, SGD solved the parity problem (with 100% accuracy, validated on a batch of 2^13 samples) in at least 20% of 25 random trials, for at least one choice of learning rate in {0.001, 0.01, 0.1, 1}.