A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
Authors: Mingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jarret Ross, Tianbao Yang, Payel Das
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on GANs demonstrate the effectiveness of the proposed algorithm. We empirically demonstrate the effectiveness of the proposed algorithm using a variant of DPOSG implementing Adam updates and show a speedup compared with the single machine baseline for different neural network architectures on several benchmark datasets, including WGAN-GP on CIFAR10 [22] and Self-Attention GAN on Image Net [23]. |
| Researcher Affiliation | Collaboration | Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA |
| Pseudocode | Yes | Algorithm 1 Decentralized Parallel Optimistic Stochastic Gradient (DPOSG) |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of their methodology. |
| Open Datasets | Yes | We consider two experiments. The first one is WGAN-GP [22] on CIFAR10 dataset, and the second one is Self-Attention GAN [23] on Image Net dataset. |
| Dataset Splits | No | The paper mentions tuning learning rates and using the best performing one for CP-OAdam, implying the use of a validation set. However, it does not provide specific details on the dataset splits (e.g., percentages or sample counts for training, validation, and test sets), nor does it reference standard validation splits with citations. |
| Hardware Specification | No | The paper mentions running experiments in a "HPC environment" and a "cloud computing environment" but does not specify any exact GPU or CPU models, memory details, or other specific hardware configurations used. |
| Software Dependencies | No | The paper mentions using "deep learning frameworks (e.g., Tensor Flow, Py Torch, etc.)" but does not provide specific version numbers for these or any other software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | For both experiments, we tune the learning rate in {1 10 3, 4 10 4, 2 10 4, 1 10 4, 4 10 5, 2 10 5, 1 10 5} and choose the one which delivers the best performance for the centralized baseline (CP-OAdam), and decentralized algorithms (DP-OAdam, Rand-DP-OAdam) are using the same learning rate as CP-OAdam. For Self-Attention GAN on Image Net, we tune different learning rates for discriminator and generator respectively and choose to use 10 3 for generator and 4 10 5 for the discriminator. We fix the total batch size as 256 (i.e. the product of batch size per learner and number of learners is 256). |