On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Authors: Yiming Zhang, Keith W Ross

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments comparing the performance of ATRPO and TRPO on continuing control tasks. We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016)
Researcher Affiliation Academia Yiming Zhang 1 Keith W. Ross 2 1New York University 2New York University Shanghai.
Pseudocode Yes Algorithm 1 Approximate Average Reward Policy Iteration; Algorithm 2 Average Reward TRPO (ATRPO)
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016)
Dataset Splits No The paper describes an evaluation protocol during training ("every 100,000 steps, we run 10 separate evaluation trajectories"), which serves a similar purpose to validation. However, it does not explicitly define distinct training, validation, and test dataset splits in terms of percentages or counts, nor does it refer to a predefined 'validation set' for hyperparameter tuning.
Hardware Specification No The paper mentions using the "Mu Jo Co physical simulator" but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions "Mu Jo Co physical simulator (Todorov et al., 2012)" and "Open AI gym (Brockman et al., 2016)", but it does not specify version numbers for these or any other software dependencies, such as programming languages or libraries like PyTorch or TensorFlow.
Experiment Setup No The paper states, "Hyperparameter settings and other additional details can be found in Appendix H." This indicates that specific setup details are not present in the main text, as required by the prompt.