Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Authors: Blaise Delattre, Quentin Barthélemy, Alexandre Araujo, Alexandre Allauzen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
Researcher Affiliation Collaboration 1FOXSTREAM, Vaulx-en-Velin, France 2Miles Team, LAMSADE, Université Paris-Dauphine, PSL University, Paris, France 3New York University 4ESPCI PSL, Paris, France.
Pseudocode Yes Algorithm 1 : Power_iteration(G, Niter) Algorithm 2 : Lip_dense(G, Niter) Algorithm 3 : Lip_conv(K, Niter) Algorithm 4 : Gram_iteration_naive(G, Niter)
Open Source Code Yes For research reproducibility, the code is available https://github.com/blaisedelattre/lip4conv.
Open Datasets Yes Inspired by (Singla et al., 2021), this experiment estimates the Lipschitz constant for each convolutional layer of a Res Net18 (He et al., 2016), pre-trained on the Image Net1k dataset. We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256.
Dataset Splits No The paper mentions using CIFAR-10 and ImageNet-1k datasets but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes All experiences were done on one NVIDIA RTX A6000 GPU. Trainings are repeated four times on 4 GPU V100.
Software Dependencies No The paper mentions using PyTorch for SVD operations but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256. We use SGD with a momentum of 0.9 and an initial learning rate of 0.1 with a cosine annealing schedule.