reproducibilityindex.ai

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Authors: Blaise Delattre, Quentin Barthélemy, Alexandre Araujo, Alexandre Allauzen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
Researcher Affiliation	Collaboration	1FOXSTREAM, Vaulx-en-Velin, France 2Miles Team, LAMSADE, Université Paris-Dauphine, PSL University, Paris, France 3New York University 4ESPCI PSL, Paris, France.
Pseudocode	Yes	Algorithm 1 : Power_iteration(G, Niter) Algorithm 2 : Lip_dense(G, Niter) Algorithm 3 : Lip_conv(K, Niter) Algorithm 4 : Gram_iteration_naive(G, Niter)
Open Source Code	Yes	For research reproducibility, the code is available https://github.com/blaisedelattre/lip4conv.
Open Datasets	Yes	Inspired by (Singla et al., 2021), this experiment estimates the Lipschitz constant for each convolutional layer of a Res Net18 (He et al., 2016), pre-trained on the Image Net1k dataset. We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256.
Dataset Splits	No	The paper mentions using CIFAR-10 and ImageNet-1k datasets but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	Yes	All experiences were done on one NVIDIA RTX A6000 GPU. Trainings are repeated four times on 4 GPU V100.
Software Dependencies	No	The paper mentions using PyTorch for SVD operations but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup	Yes	We use Res Net18 architecture (He et al., 2016), trained on the CIFAR-10 dataset for 200 epochs, and with a batch size of 256. We use SGD with a momentum of 0.9 and an initial learning rate of 0.1 with a cosine annealing schedule.