Efficient Training of Low-Curvature Neural Networks
Authors: Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, François Fleuret
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we perform experiments to (1) evaluate the effectiveness of our proposed method in training models with low curvature as originally intended, (2) evaluate whether low curvature models have robust gradients in practice, and (3) evaluate the effectiveness of low-curvature models for adversarial robustness. Our experiments are primarily conducted on a base ResNet-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29]), and using the Pytorch [30] framework. |
| Researcher Affiliation | Academia | Suraj Srinivas 1 Harvard University ssrinivas@seas.harvard.edu Kyle Matoba Idiap Research Institute & EPFL kyle.matoba@epfl.ch Himabindu Lakkaraju Harvard University hlakkaraju@hbs.edu François Fleuret University of Geneva francois.fleuret@unige.ch |
| Pseudocode | No | The paper mentions providing a "PyTorch-style code snippet in the appendix" but does not explicitly label it as "Pseudocode" or an "Algorithm" block within the main text. |
| Open Source Code | Yes | Code to implement our method and replicate our experiments is available at https://github.com/kylematoba/lcnn. |
| Open Datasets | Yes | Our experiments are primarily conducted on a base Res Net-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29]) |
| Dataset Splits | No | The paper specifies training details but does not explicitly mention a validation dataset split or how it was used for model selection or hyperparameter tuning. |
| Hardware Specification | Yes | Our methods entailed fairly modest computation our most involved computations can be completed in under three GPU days, and all experimental results could be computed in less than 60 GPU-days. We used a mixture of GPUs primarily NVIDIA Ge Force GTX 1080 Tis on an internal compute cluster. |
| Software Dependencies | No | Our experiments are primarily conducted on a base Res Net-18 architecture ([28]) using the CIFAR10 and CIFAR100 datasets ([29]), and using the Pytorch [30] framework. ... We use the Cleverhans library [33] to implement PGD. The paper mentions software like PyTorch and Cleverhans but does not specify their version numbers. |
| Experiment Setup | Yes | All our models are trained for 200 epochs with an SGD + momentum optimizer, with a momentum of 0.9 and an initial learning rate of 0.1 which decays by a factor of 10 at 150 and 175 epochs, and a weight decay of 5 × 10−4. |