Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

Authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluation of the proposed algorithms in Mu Jo Co environments, we demonstrate that Fed NPG-ADMM maintains the reward performance of standard Fed NPG, and that its convergence rate improves when the number of federated agents increases. 5 Simulations
Researcher Affiliation Academia Guangchen Lan Purdue University West Lafayette, IN 47907 EMAIL Han Wang Columbia University New York, NY 10027 EMAIL James Anderson Columbia University New York, NY 10027 EMAIL Christopher Brinton Purdue University West Lafayette, IN 47907 EMAIL Vaneet Aggarwal Purdue University West Lafayette, IN 47907 EMAIL
Pseudocode Yes Algorithm 1 Fed NPG-ADMM
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, such as a specific repository link or an explicit code release statement.
Open Datasets Yes We consider three Mu Jo Co tasks [44] with the MIT License, which have continuous state spaces.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for validation.
Hardware Specification Yes The tasks are trained on NVIDIA RTX 3080 GPU with 10 GB of memory.
Software Dependencies No The paper mentions software like PyTorch, Adam optimizer, and stable-baselines3 but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Table 4: Hyperparameter and MLP settings. Hyperparameter Setting Task Swimmer-v4 Hopper-v4 Humanoid-v4 MLP 64 64 128 128 512 512 512 Activation function Re LU Re LU Re LU Output function Tanh Tanh Tanh Penalty (ρ) 0.1 0.1 0.01 Radius (δ) 0.01 0.01 0.01 Discount (γ) 0.99 0.99 0.99 Timesteps (T) 2048 1024 512 Iterations (K) 1 103 2 103 3 103 Learning rate 3 10 4 3 10 4 1 10 5