Investigating Generalization by Controlling Normalized Margin
Authors: Alexander R Farhang, Jeremy D Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. |
| Researcher Affiliation | Collaboration | Alexander R. Farhang 1 Jeremy Bernstein 1 Kushal Tirumala 1 Yang Liu 2 Yisong Yue 1 2 1Caltech 2Argo AI. |
| Pseudocode | Yes | Recipe 1 Controlling Frobenius-normalized margin γF. The recipe targets γF (xi, yi; w) = αi across training points {xi, yi}n i=1 for an L-layer MLP f L(x; w). |
| Open Source Code | Yes | Code available at: https://github.com/alexfarhang/margin. |
| Open Datasets | Yes | Two sets of experiments were performed, each of which trained two MLPs on 1000 point subsets of MNIST to classify either true or randomly labeled data for 10-class classification. For MNIST 0 vs. 1 classification, the training set size was 12665 and test set size was 2115. For CIFAR-10 dog vs. ship, the training set size was 10000 and test set size was 2000. |
| Dataset Splits | No | For MNIST 0 vs. 1 classification, the training set size was 12665 and test set size was 2115. For MNIST 4 vs. 7 classification, the training set size was 12107 and test size was 2010. For MNIST 3 vs. 8 classification, the training set size was 11982 and test set size was 1984. For CIFAR-10 dog vs. ship, the training set size was 10000 and test set size was 2000. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | This paper employs the Nero optimizer (Liu et al., 2021)... |
| Experiment Setup | Yes | Depth 5, width 5000 fully connected neural networks were trained for 10-class classification on subsets of 1000 training points from MNIST... Rectified Linear unit (Re LU) activations were used throughout all experiments. ...trained with a label-scaled squared loss function... full batch gradient descent with a learning rate of 0.01 and and an exponential learning rate decay of 0.999... trained with Frobenius control using the Nero optimizer (learning rate: 0.01, Nero β: 0.999)... 2-layer MLPs were trained for 10-class classification on 1000 point subsets of MNIST. ...Networks were trained between 50,000 to 250,000 epochs (learning rates between 0.9998 and 0.999998). |