On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Authors: Yiming Zhang, Keith W Ross
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments comparing the performance of ATRPO and TRPO on continuing control tasks. We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016) |
| Researcher Affiliation | Academia | Yiming Zhang 1 Keith W. Ross 2 1New York University 2New York University Shanghai. |
| Pseudocode | Yes | Algorithm 1 Approximate Average Reward Policy Iteration; Algorithm 2 Average Reward TRPO (ATRPO) |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper describes an evaluation protocol during training ("every 100,000 steps, we run 10 separate evaluation trajectories"), which serves a similar purpose to validation. However, it does not explicitly define distinct training, validation, and test dataset splits in terms of percentages or counts, nor does it refer to a predefined 'validation set' for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions using the "Mu Jo Co physical simulator" but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions "Mu Jo Co physical simulator (Todorov et al., 2012)" and "Open AI gym (Brockman et al., 2016)", but it does not specify version numbers for these or any other software dependencies, such as programming languages or libraries like PyTorch or TensorFlow. |
| Experiment Setup | No | The paper states, "Hyperparameter settings and other additional details can be found in Appendix H." This indicates that specific setup details are not present in the main text, as required by the prompt. |