Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration
Authors: Blaise Delattre, Quentin Barthélemy, Alexandre Araujo, Alexandre Allauzen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches. |
| Researcher Affiliation | Collaboration | 1FOXSTREAM, Vaulx-en-Velin, France 2Miles Team, LAMSADE, Université Paris-Dauphine, PSL University, Paris, France 3New York University 4ESPCI PSL, Paris, France. |
| Pseudocode | Yes | Algorithm 1 : Power_iteration(G, Niter) Algorithm 2 : Lip_dense(G, Niter) Algorithm 3 : Lip_conv(K, Niter) Algorithm 4 : Gram_iteration_naive(G, Niter) |
| Open Source Code | Yes | For research reproducibility, the code is available https://github.com/blaisedelattre/lip4conv. |
| Open Datasets | Yes | Inspired by (Singla et al., 2021), this experiment estimates the Lipschitz constant for each convolutional layer of a Res Net18 (He et al., 2016), pre-trained on the Image Net1k dataset. We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet-1k datasets but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | All experiences were done on one NVIDIA RTX A6000 GPU. Trainings are repeated four times on 4 GPU V100. |
| Software Dependencies | No | The paper mentions using PyTorch for SVD operations but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256. We use SGD with a momentum of 0.9 and an initial learning rate of 0.1 with a cosine annealing schedule. |