Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

Authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.
Researcher Affiliation Academia ali@uio.no firstname.lastname@epfl.ch These authors contributed equally to this work. Department of Informatics, University of Oslo. Work performed at EPFL (LIONS) and Aalborg University. Laboratory for Information and Inference Systems (LIONS), EPFL.
Pseudocode Yes Algorithm 1 Q-Gen X: Loops are executed in parallel on processors. At certain steps, each processor computes sufficient statistics of a parametric distribution to estimate distribution of dual vectors.
Open Source Code Yes Open source code will be released at https://github.com/LIONS-EPFL/qgenx
Open Datasets Yes train a WGAN-GP (Arjovsky et al., 2017) on CIFAR10 (Krizhevsky, 2009).
Dataset Splits No The paper states it uses CIFAR10, which has standard splits, but does not explicitly provide the training/validation/test split percentages or sample counts in the text.
Hardware Specification Yes The experiments were performed on 3 Nvidia V100 GPUs (1 per node) using a Kubernetes cluster and an image built on the torch_cgx Docker image.
Software Dependencies No The paper mentions "torch_cgx pytorch extension", "Open MPI", and "Weights and Biases", but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes we share an effective batch size of 1024 across 3 nodes (strong scaling) connected via Ethernet, and use Layernorm (Ba et al., 2016) instead of Batchnorm (Ioffe & Szegedy, 2015). The results are shown in Figure 1 (left) showing evolution of FID.