Efficient Training of Low-Curvature Neural Networks

Authors: Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, François Fleuret

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we perform experiments to (1) evaluate the effectiveness of our proposed method in training models with low curvature as originally intended, (2) evaluate whether low curvature models have robust gradients in practice, and (3) evaluate the effectiveness of low-curvature models for adversarial robustness. Our experiments are primarily conducted on a base ResNet-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29]), and using the Pytorch [30] framework.
Researcher Affiliation Academia Suraj Srinivas 1 Harvard University ssrinivas@seas.harvard.edu Kyle Matoba Idiap Research Institute & EPFL kyle.matoba@epfl.ch Himabindu Lakkaraju Harvard University hlakkaraju@hbs.edu François Fleuret University of Geneva francois.fleuret@unige.ch
Pseudocode No The paper mentions providing a "PyTorch-style code snippet in the appendix" but does not explicitly label it as "Pseudocode" or an "Algorithm" block within the main text.
Open Source Code Yes Code to implement our method and replicate our experiments is available at https://github.com/kylematoba/lcnn.
Open Datasets Yes Our experiments are primarily conducted on a base Res Net-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29])
Dataset Splits No The paper specifies training details but does not explicitly mention a validation dataset split or how it was used for model selection or hyperparameter tuning.
Hardware Specification Yes Our methods entailed fairly modest computation our most involved computations can be completed in under three GPU days, and all experimental results could be computed in less than 60 GPU-days. We used a mixture of GPUs primarily NVIDIA Ge Force GTX 1080 Tis on an internal compute cluster.
Software Dependencies No Our experiments are primarily conducted on a base Res Net-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29]), and using the Pytorch [30] framework. ... We use the Cleverhans library [33] to implement PGD. The paper mentions software like PyTorch and Cleverhans but does not specify their version numbers.
Experiment Setup Yes All our models are trained for 200 epochs with an SGD + momentum optimizer, with a momentum of 0.9 and an initial learning rate of 0.1 which decays by a factor of 10 at 150 and 175 epochs, and a weight decay of 5 × 10−4.