On the generalization of learning algorithms that do not converge

Authors: Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically validate the main ideas of section 3 and 4 on VGG16 and Res Net18 models trained on the CIFAR10 dataset (see Appendix D for further numerical results, [Chandramoorthy and Loukas, 2023] for the code). For all our experiments, ϕS is an SGD update with momentum 0.9, fixed learning rate 0.01 and batch size of 128. In all figures, time indicates number of epochs. We generate different versions of the training set Sp by corrupting CIFAR10 s labels with probability p, with S0 being the original CIFAR10 dataset. Figures 2 and 3 show results corresponding to p = 0, 0.1, 0.17, 0.25 and 0.5. Each line in Figure 3 is a sample mean over 10 random initializations.
Researcher Affiliation Collaboration Nisha Chandramoorthy Institute for Data, Systems and Society Massachusetts Institute of Technology nishac@mit.edu Andreas Loukas Prescient Design Genentech, Roche andreas.loukas@roche.com Khashayar Gatmiry Electrical Engineering and Computer Science Massachusetts Institute of Technology gatmiry@mit.edu Stefanie Jegelka Electrical Engineering and Computer Science Massachusetts Institute of Technology stefje@mit.edu
Pseudocode No The paper describes algorithms and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes [Chandramoorthy and Loukas, 2023] for the code.
Open Datasets Yes We numerically validate the main ideas of section 3 and 4 on VGG16 and Res Net18 models trained on the CIFAR10 dataset
Dataset Splits No The paper mentions training and testing on the CIFAR10 dataset but does not specify details about validation splits, percentages, or methodology for creating training/validation/test sets.
Hardware Specification No The paper does not specify the exact hardware used for the experiments, such as specific GPU models, CPU types, or memory configurations.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions).
Experiment Setup Yes For all our experiments, ϕS is an SGD update with momentum 0.9, fixed learning rate 0.01 and batch size of 128.