On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory

Authors: Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif6661-6669

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically study the approximation of this algorithm and show experimentally that it is more efficient and accurate than competing approaches. Finally, we illustrate our approach on adversarial robustness. and Table 2: This table shows the Accuracy under ℓ2 and ℓ attacks of CIFAR10/100 datasets.
Researcher Affiliation Academia Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif PSL, Universit e Paris-Dauphine, CNRS, LAMSADE, MILES Team, Paris, France alexandre.araujo@dauphine.eu
Pseudocode Yes Algorithm 1 Poly Grid 1: input polynomial f, number of samples S 2: output approximated maximum modulus of f 3: σ 0, ω1 0, ϵ 2π/S 4: for i = 0 to S 1 do 5: ω1 ω1 + ϵ, ω2 0 6: for j = 0 to S 1 do 7: ω2 ω2 + ϵ 8: σ max(σ, f(ω1, ω2)) 9: end for 10: end for 11: return σ
Open Source Code No The paper mentions 'supplementary material' but does not explicitly state that the source code for the methodology is openly provided or give a link.
Open Datasets Yes CIFAR10/100 Dataset For all our experiments, we use the Wide Res Net architecture introduced by Zagoruyko and Komodakis (2016) to train our classifiers. and Experimental Settings for Image Net Dataset For all our experiments, we use the Resnet-101 architecture (He et al. 2016).
Dataset Splits No The paper mentions training parameters and evaluation on a test set but does not explicitly detail the training/test/validation dataset splits, nor does it mention a dedicated validation set.
Hardware Specification Yes The comparison has been made on a Tesla V100 GPU.
Software Dependencies No The paper mentions 'Py Torch CUDA profiler' which implies PyTorch, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes We use Wide Resnet networks with 28 layers and a width factor of 10. We train our networks for 200 epochs with a batch size of 200. We use Stochastic Gradient Descent with a momentum of 0.9, an initial learning rate of 0.1 with exponential decay of 0.1 (Multi Step LR gamma = 0.1) after the epochs 60, 120 and 160. For Adversarial Training (Madry et al. 2018), we use Projected Gradient Descent with an ϵ = 8/255( 0.031), a step size of ϵ/5( 0.0062) and 10 iterations, we use a random initialization but run the attack only once.