Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise
Authors: Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechenskii, Alexander Gasnikov, Gauthier Gidel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our theoretical results, we conduct experiments on heavy-tailed min-max problems to demonstrate the importance of clipping when using non-adaptive methods such as SGDA or SEG. We train a Wasserstein GAN with gradient penalty [Gulrajani et al., 2017] on CIFAR-10 [Krizhevsky et al., 2009] using SGDA, clipped-SGDA, and clipped-SEG, and show the evolution of the gradient noise histograms during training. |
| Researcher Affiliation | Academia | Eduard Gorbunov MIPT, Russia Mila & Ude M, Canada MBZUAI, UAE Marina Danilova MIPT, Russia David Dobre Mila & Ude M, Canada Pavel Dvurechensky WIAS, Germany Alexander Gasnikov MIPT, Russia HSE University, Russia IITP RAS, Russia Gauthier Gidel Mila & Ude M, Canada Canada CIFAR AI Chair |
| Pseudocode | Yes | xk+1 = xk γ2 e Fξk 2(exk), where exk = xk γ1 e Fξk 1(xk), (clipped-SEG) e Fξk 1(xk) = clip i=1 Fξi,k 1 (xk), λ1,k , e Fξk 2(exk) = clip i=1 Fξi,k 2 (exk), λ2,k where {ξi,k 1 }m1,k i=1 , {ξi,k 2 }m2,k i=1 are independent samples from the distribution D. |
| Open Source Code | Yes | Our codes are publicly available: https://github.com/busycalibrating/ clipped-stochastic-methods. |
| Open Datasets | Yes | We train a Wasserstein GAN with gradient penalty [Gulrajani et al., 2017] on CIFAR-10 [Krizhevsky et al., 2009]... We train on FFHQ downsampled to 128 128 pixels [Karras et al., 2019]. |
| Dataset Splits | No | The paper mentions training models on CIFAR-10 and FFHQ and conducting hyperparameter sweeps, but it does not explicitly provide percentages or absolute counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers (e.g., 'PyTorch 1.9'). It mentions adapting code from a 'publicly available WGAN-GP implementation' and 'pytorch-gan-collections' but without specific versioning. |
| Experiment Setup | Yes | We use the default architectures and training parameters specified in Gulrajani et al. [2017] (λGP = 10, ndis = 5, learning rate decayed linearly to 0 over 100k steps)... We train on FFHQ downsampled to 128 128 pixels, and use the recommended Style GAN2 hyperparameter configuration for this resolution (batch size = 32, γ = 0.1024, map depth = 2, channel multiplier = 16384). |