Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

Authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluation of the proposed algorithms in Mu Jo Co environments, we demonstrate that Fed NPG-ADMM maintains the reward performance of standard Fed NPG, and that its convergence rate improves when the number of federated agents increases. 5 Simulations
Researcher Affiliation Academia Guangchen Lan Purdue University West Lafayette, IN 47907 lan44@purdue.edu Han Wang Columbia University New York, NY 10027 hw2786@columbia.edu James Anderson Columbia University New York, NY 10027 anderson@ee.columbia.edu Christopher Brinton Purdue University West Lafayette, IN 47907 cgb@purdue.edu Vaneet Aggarwal Purdue University West Lafayette, IN 47907 vaneet@purdue.edu
Pseudocode Yes Algorithm 1 Fed NPG-ADMM
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, such as a specific repository link or an explicit code release statement.
Open Datasets Yes We consider three Mu Jo Co tasks [44] with the MIT License, which have continuous state spaces.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation.
Hardware Specification Yes The tasks are trained on NVIDIA RTX 3080 GPU with 10 GB of memory.
Software Dependencies No The paper mentions software like PyTorch, Adam optimizer, and stable-baselines3 but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Table 4: Hyperparameter and MLP settings. Hyperparameter Setting Task Swimmer-v4 Hopper-v4 Humanoid-v4 MLP 64 64 128 128 512 512 512 Activation function Re LU Re LU Re LU Output function Tanh Tanh Tanh Penalty (ρ) 0.1 0.1 0.01 Radius (δ) 0.01 0.01 0.01 Discount (γ) 0.99 0.99 0.99 Timesteps (T) 2048 1024 512 Iterations (K) 1 103 2 103 3 103 Learning rate 3 10 4 3 10 4 1 10 5