On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory
Authors: Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif6661-6669
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically study the approximation of this algorithm and show experimentally that it is more efficient and accurate than competing approaches. Finally, we illustrate our approach on adversarial robustness. and Table 2: This table shows the Accuracy under ℓ2 and ℓ attacks of CIFAR10/100 datasets. |
| Researcher Affiliation | Academia | Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif PSL, Universit e Paris-Dauphine, CNRS, LAMSADE, MILES Team, Paris, France alexandre.araujo@dauphine.eu |
| Pseudocode | Yes | Algorithm 1 Poly Grid 1: input polynomial f, number of samples S 2: output approximated maximum modulus of f 3: σ 0, ω1 0, ϵ 2π/S 4: for i = 0 to S 1 do 5: ω1 ω1 + ϵ, ω2 0 6: for j = 0 to S 1 do 7: ω2 ω2 + ϵ 8: σ max(σ, f(ω1, ω2)) 9: end for 10: end for 11: return σ |
| Open Source Code | No | The paper mentions 'supplementary material' but does not explicitly state that the source code for the methodology is openly provided or give a link. |
| Open Datasets | Yes | CIFAR10/100 Dataset For all our experiments, we use the Wide Res Net architecture introduced by Zagoruyko and Komodakis (2016) to train our classifiers. and Experimental Settings for Image Net Dataset For all our experiments, we use the Resnet-101 architecture (He et al. 2016). |
| Dataset Splits | No | The paper mentions training parameters and evaluation on a test set but does not explicitly detail the training/test/validation dataset splits, nor does it mention a dedicated validation set. |
| Hardware Specification | Yes | The comparison has been made on a Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch CUDA profiler' which implies PyTorch, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We use Wide Resnet networks with 28 layers and a width factor of 10. We train our networks for 200 epochs with a batch size of 200. We use Stochastic Gradient Descent with a momentum of 0.9, an initial learning rate of 0.1 with exponential decay of 0.1 (Multi Step LR gamma = 0.1) after the epochs 60, 120 and 160. For Adversarial Training (Madry et al. 2018), we use Projected Gradient Descent with an ϵ = 8/255( 0.031), a step size of ϵ/5( 0.0062) and 10 iterations, we use a random initialization but run the attack only once. |