Byzantine-Tolerant Methods for Distributed Variational Inequalities
Authors: Nazarii Tupitsa, Abdulla Jasem Almansoori, Yanlin Wu, Martin Takac, Karthik Nandakumar, Samuel Horváth, Eduard Gorbunov
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work makes a further step in this direction by providing several (provably) Byzantine-robust methods for distributed variational inequality, thoroughly studying their theoretical convergence, removing the limitations of the previous work, and providing numerical comparisons supporting the theoretical findings. |
| Researcher Affiliation | Academia | Nazarii Tupitsa MBZUAI, MIPT Abdulla Jasem Almansoori MBZUAI Yanlin Wu MBZUAI Martin Takáˇc MBZUAI Karthik Nandakumar MBZUAI Samuel Horváth MBZUAI Eduard Gorbunov |
| Pseudocode | Yes | Algorithm 1 SGDA-RA; Algorithm 2 SEG-RA; Algorithm 3 M-SGDA-RA; Algorithm 4 Check Computations; Algorithm 5 SGDA-CC; Algorithm 6 R-SGDA-CC; Algorithm 7 SEG-CC; Algorithm 8 R-SEG-CC. |
| Open Source Code | Yes | Code for quadratic games is available at https://github.com/nazya/sgda-ra7. ... Code for GANs is available at https://github.com/zeligism/vi-robust-agg. |
| Open Datasets | Yes | We conduct numerical experiments on a quadratic game... Robust Neural Networks training. ... {(xi, yi)}N 1 is the MNIST dataset. ... The dataset we chose for this experiment is CIFAR-10. |
| Dataset Splits | Yes | Specifically, we show the validation error on MNIST after each epoch. |
| Hardware Specification | No | The paper mentions "simulate n = 20 nodes on a single machine" but does not provide specific hardware details such as CPU/GPU models, processor types, or memory. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | We set the parameter α = 0.1 for M-SGDA-RA, and the following parameters for RDEG: αRDEG = 0.06, δRDEG = 0.9 and theoretical value of ϵ. ... γ = 2e 5. ... We fix the learning rate to 0.01 and use a batch size of 32. We run the algorithm for 50 epochs and average our results across 3 runs. ... We let n = 20, B = 4, λ1 = 0, and λ2 = 100. ... We let n = 10, B = 2, and choose a learning rate of 0.001, β1 = 0.5, and β2 = 0.9 with a batch size of 64. We run the algorithms for 4600 epochs. |