A Neural Tangent Kernel Perspective of GANs

Authors: Jean-Yves Franceschi, Emmanuel De Bézenac, Ibrahim Ayed, Mickael Chen, Sylvain Lamprier, Patrick Gallinari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically corroborate these results via an analysis toolkit based on our framework, unveiling intuitions that are consistent with GAN practice. [...] We present a selection of empirical results for different losses and architectures to show the relevance of our framework, with more insights in Appendix C, by evaluating its adequacy and practical implications on GAN convergence.
Researcher Affiliation Collaboration 1Criteo AI Lab, Paris, France 2Sorbonne Université, CNRS, ISIR, F-75005 Paris, France 3Seminar for Applied Mathematics, D-MATH, ETH Zürich, Rämistrasse 101, Zürich-8092, Switzerland 4There SIS Lab, Thales, Palaiseau, France 5Valeo.ai, Paris, France.
Pseudocode No The paper describes methods and processes using mathematical equations and prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Moreover, we release an analysis toolkit based on our framework, GAN(TK)2, which we use to empirically validate our analysis and gather new empirical insights: for example, we study the singular performance of the Re LU activation in GAN architectures. [...] that we release at https://github.com/emited/gantk2
Open Datasets Yes 8 Gaussians. The target distribution is composed of 8 Gaussians... AB and Density. These two datasets are taken from the Geomloss library examples (Feydy et al., 2019)1... MNIST and Celeb A. We preprocess each MNIST image (Le Cun et al., 1998)... Celeb A images (Liu et al., 2015)...
Dataset Splits No The paper describes sampling methods and the size of the subsets used for experiments but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, exact counts, or cross-validation setup).
Hardware Specification Yes All experiments presented in this paper were run on Nvidia GPUs (Nvidia Titan RTX 24GB of VRAM with CUDA 11.2 as well as Nvidia Titan V 12GB and Nvidia Ge Force RTX 2080 Ti 11 GB with CUDA 10.2).
Software Dependencies Yes GAN(TK)2 is implemented in Python (tested on versions 3.8.1 and 3.9.2) and based on JAX (Bradbury et al., 2018) for tensor computations and Neural Tangents (Novak et al., 2020) for NTKs.
Experiment Setup Yes We used for the neural networks of our experiments the standard NTK paramaterization (Jacot et al., 2018), with a scaling factor of 1 for matrix multiplications and, when bias in enabled, a multiplicative constant of 1 for biases (except for sigmoid where this bias factor is lowered to 0.2 to avoid saturating the sigmoid, and for Celeb A where it is equal to 4). All considered networks are composed of 3 hidden layers and end with a linear layer. In the finite-width case, the width of these hidden layers is 128. We additionally use antisymmetrical initialization (Zhang et al., 2020), except for the finite-width LSGAN loss. Discriminators in the finite-width regime are trained using full-batch gradient descent without momentum, with one step per update to the distributions and the following learning rates ε: for the IPM loss: ε = 0.01; for the IPM loss with reset and LSGAN: ε = 0.1.