Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates
Authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through evaluation of the proposed algorithms in Mu Jo Co environments, we demonstrate that Fed NPG-ADMM maintains the reward performance of standard Fed NPG, and that its convergence rate improves when the number of federated agents increases. 5 Simulations |
| Researcher Affiliation | Academia | Guangchen Lan Purdue University West Lafayette, IN 47907 lan44@purdue.edu Han Wang Columbia University New York, NY 10027 hw2786@columbia.edu James Anderson Columbia University New York, NY 10027 anderson@ee.columbia.edu Christopher Brinton Purdue University West Lafayette, IN 47907 cgb@purdue.edu Vaneet Aggarwal Purdue University West Lafayette, IN 47907 vaneet@purdue.edu |
| Pseudocode | Yes | Algorithm 1 Fed NPG-ADMM |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, such as a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | We consider three Mu Jo Co tasks [44] with the MIT License, which have continuous state spaces. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation. |
| Hardware Specification | Yes | The tasks are trained on NVIDIA RTX 3080 GPU with 10 GB of memory. |
| Software Dependencies | No | The paper mentions software like PyTorch, Adam optimizer, and stable-baselines3 but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Table 4: Hyperparameter and MLP settings. Hyperparameter Setting Task Swimmer-v4 Hopper-v4 Humanoid-v4 MLP 64 64 128 128 512 512 512 Activation function Re LU Re LU Re LU Output function Tanh Tanh Tanh Penalty (ρ) 0.1 0.1 0.01 Radius (δ) 0.01 0.01 0.01 Discount (γ) 0.99 0.99 0.99 Timesteps (T) 2048 1024 512 Iterations (K) 1 103 2 103 3 103 Learning rate 3 10 4 3 10 4 1 10 5 |