Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Authors: Akhil Agnihotri, Rahul Jain, Haipeng Luo
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging Open AI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs. |
| Researcher Affiliation | Collaboration | 1University of Southern California, Los Angeles, CA, USA. RJ is also affiliated with Google Deep Mind. |
| Pseudocode | Yes | Algorithm 1 Average-Constrained Policy Optimization (ACPO) |
| Open Source Code | No | Code of the ACPO implementation will be made available on Git Hub. |
| Open Datasets | Yes | We work with the Open AI Gym environments to train the various learning agent on the following tasks Gather, Circle, Grid, and Bottleneck tasks (see Figure 3 in Appendix A.6.1 for more details on the environments). For our experimental evaluation, we use several Open AI Gym environments from Todorov et al. (2012). |
| Dataset Splits | No | The paper describes training steps and evaluation trajectories but does not explicitly provide percentages or counts for a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using implementations from specific GitHub repositories and Open AI Gym environments but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table 1. Hyperparameter Setup includes: No. of hidden layers, Activation, Initial log std, Batch size, GAE parameter (reward), GAE parameter (cost), Trust region step size δ, Learning rate for policy, Learning rate for reward critic net, Learning rate for cost critic net, Backtracking coeff., Max backtracking iterations, Max conjugate gradient iterations, Recovery regime parameter t. Also, Section 5.1 details neural network sizes. |