reproducibilityindex.ai

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Authors: Yiming Zhang, Keith W Ross

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments comparing the performance of ATRPO and TRPO on continuing control tasks. We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016)
Researcher Affiliation	Academia	Yiming Zhang 1 Keith W. Ross 2 1New York University 2New York University Shanghai.
Pseudocode	Yes	Algorithm 1 Approximate Average Reward Policy Iteration; Algorithm 2 Average Reward TRPO (ATRPO)
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We consider three tasks (Ant, Half Cheetah, and Humanoid) from the Mu Jo Co physical simulator (Todorov et al., 2012) implemented using Open AI gym (Brockman et al., 2016)
Dataset Splits	No	The paper describes an evaluation protocol during training ("every 100,000 steps, we run 10 separate evaluation trajectories"), which serves a similar purpose to validation. However, it does not explicitly define distinct training, validation, and test dataset splits in terms of percentages or counts, nor does it refer to a predefined 'validation set' for hyperparameter tuning.
Hardware Specification	No	The paper mentions using the "Mu Jo Co physical simulator" but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions "Mu Jo Co physical simulator (Todorov et al., 2012)" and "Open AI gym (Brockman et al., 2016)", but it does not specify version numbers for these or any other software dependencies, such as programming languages or libraries like PyTorch or TensorFlow.
Experiment Setup	No	The paper states, "Hyperparameter settings and other additional details can be found in Appendix H." This indicates that specific setup details are not present in the main text, as required by the prompt.