Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

Authors: Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechenskii, Alexander Gasnikov, Gauthier Gidel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our theoretical results, we conduct experiments on heavy-tailed min-max problems to demonstrate the importance of clipping when using non-adaptive methods such as SGDA or SEG. We train a Wasserstein GAN with gradient penalty [Gulrajani et al., 2017] on CIFAR-10 [Krizhevsky et al., 2009] using SGDA, clipped-SGDA, and clipped-SEG, and show the evolution of the gradient noise histograms during training.
Researcher Affiliation Academia Eduard Gorbunov MIPT, Russia Mila & Ude M, Canada MBZUAI, UAE Marina Danilova MIPT, Russia David Dobre Mila & Ude M, Canada Pavel Dvurechensky WIAS, Germany Alexander Gasnikov MIPT, Russia HSE University, Russia IITP RAS, Russia Gauthier Gidel Mila & Ude M, Canada Canada CIFAR AI Chair
Pseudocode Yes xk+1 = xk γ2 e Fξk 2(exk), where exk = xk γ1 e Fξk 1(xk), (clipped-SEG) e Fξk 1(xk) = clip i=1 Fξi,k 1 (xk), λ1,k , e Fξk 2(exk) = clip i=1 Fξi,k 2 (exk), λ2,k where {ξi,k 1 }m1,k i=1 , {ξi,k 2 }m2,k i=1 are independent samples from the distribution D.
Open Source Code Yes Our codes are publicly available: https://github.com/busycalibrating/ clipped-stochastic-methods.
Open Datasets Yes We train a Wasserstein GAN with gradient penalty [Gulrajani et al., 2017] on CIFAR-10 [Krizhevsky et al., 2009]... We train on FFHQ downsampled to 128 128 pixels [Karras et al., 2019].
Dataset Splits No The paper mentions training models on CIFAR-10 and FFHQ and conducting hyperparameter sweeps, but it does not explicitly provide percentages or absolute counts for training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers (e.g., 'PyTorch 1.9'). It mentions adapting code from a 'publicly available WGAN-GP implementation' and 'pytorch-gan-collections' but without specific versioning.
Experiment Setup Yes We use the default architectures and training parameters specified in Gulrajani et al. [2017] (λGP = 10, ndis = 5, learning rate decayed linearly to 0 over 100k steps)... We train on FFHQ downsampled to 128 128 pixels, and use the recommended Style GAN2 hyperparameter configuration for this resolution (batch size = 32, γ = 0.1024, map depth = 2, channel multiplier = 16384).