reproducibilityindex.ai

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

Authors: Akhil Agnihotri, Rahul Jain, Haipeng Luo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging Open AI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs.
Researcher Affiliation	Collaboration	1University of Southern California, Los Angeles, CA, USA. RJ is also affiliated with Google Deep Mind.
Pseudocode	Yes	Algorithm 1 Average-Constrained Policy Optimization (ACPO)
Open Source Code	No	Code of the ACPO implementation will be made available on Git Hub.
Open Datasets	Yes	We work with the Open AI Gym environments to train the various learning agent on the following tasks Gather, Circle, Grid, and Bottleneck tasks (see Figure 3 in Appendix A.6.1 for more details on the environments). For our experimental evaluation, we use several Open AI Gym environments from Todorov et al. (2012).
Dataset Splits	No	The paper describes training steps and evaluation trajectories but does not explicitly provide percentages or counts for a separate validation dataset split.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using implementations from specific GitHub repositories and Open AI Gym environments but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 1. Hyperparameter Setup includes: No. of hidden layers, Activation, Initial log std, Batch size, GAE parameter (reward), GAE parameter (cost), Trust region step size δ, Learning rate for policy, Learning rate for reward critic net, Learning rate for cost critic net, Backtracking coeff., Max backtracking iterations, Max conjugate gradient iterations, Recovery regime parameter t. Also, Section 5.1 details neural network sizes.