reproducibilityindex.ai

Investigating Generalization by Controlling Normalized Margin

Authors: Alexander R Farhang, Jeremy D Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions.
Researcher Affiliation	Collaboration	Alexander R. Farhang 1 Jeremy Bernstein 1 Kushal Tirumala 1 Yang Liu 2 Yisong Yue 1 2 1Caltech 2Argo AI.
Pseudocode	Yes	Recipe 1 Controlling Frobenius-normalized margin γF. The recipe targets γF (xi, yi; w) = αi across training points {xi, yi}n i=1 for an L-layer MLP f L(x; w).
Open Source Code	Yes	Code available at: https://github.com/alexfarhang/margin.
Open Datasets	Yes	Two sets of experiments were performed, each of which trained two MLPs on 1000 point subsets of MNIST to classify either true or randomly labeled data for 10-class classification. For MNIST 0 vs. 1 classification, the training set size was 12665 and test set size was 2115. For CIFAR-10 dog vs. ship, the training set size was 10000 and test set size was 2000.
Dataset Splits	No	For MNIST 0 vs. 1 classification, the training set size was 12665 and test set size was 2115. For MNIST 4 vs. 7 classification, the training set size was 12107 and test size was 2010. For MNIST 3 vs. 8 classification, the training set size was 11982 and test set size was 1984. For CIFAR-10 dog vs. ship, the training set size was 10000 and test set size was 2000.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	This paper employs the Nero optimizer (Liu et al., 2021)...
Experiment Setup	Yes	Depth 5, width 5000 fully connected neural networks were trained for 10-class classification on subsets of 1000 training points from MNIST... Rectified Linear unit (Re LU) activations were used throughout all experiments. ...trained with a label-scaled squared loss function... full batch gradient descent with a learning rate of 0.01 and and an exponential learning rate decay of 0.999... trained with Frobenius control using the Nero optimizer (learning rate: 0.01, Nero β: 0.999)... 2-layer MLPs were trained for 10-class classification on 1000 point subsets of MNIST. ...Networks were trained between 50,000 to 250,000 epochs (learning rates between 0.9998 and 0.999998).