reproducibilityindex.ai

Taylor Expansion of Discount Factors

Authors: Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the empirical performance of new algorithmic changes to the baseline algorithms. We focus on robotics control experiments with continuous state and action space. The tasks are available in Open AI gym (Brockman et al., 2016), with backends such as Mu Jo Co (Todorov et al., 2012) and bullet physics (Coumans, 2015). We label the tasks as gym (G) and bullet (B) respectively. We always compare the undiscounted cumulative rewards evaluated under a default evaluation horizon T = 1000.
Researcher Affiliation	Collaboration	Yunhao Tang 1 Columbia University, New York, USA 2Deep Mind, London, UK 3Deep Mind, Paris, France.
Pseudocode	Yes	Algorithm 1 Estimating the Kth order expansion; Algorithm 2 Taylor expansion Q-function estimation; Algorithm 3 Taylor expansion update weighting.
Open Source Code	No	The paper does not provide any specific repository links or explicit statements about the release of source code for the methodology described.
Open Datasets	Yes	The tasks are available in Open AI gym (Brockman et al., 2016), with backends such as Mu Jo Co (Todorov et al., 2012) and bullet physics (Coumans, 2015).
Dataset Splits	No	The paper mentions running experiments across "5 seeds" but does not provide specific dataset split information (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper mentions "Google Cloud Platform for computational resources" in the acknowledgements, but this is a general cloud platform and does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions various software components and environments like "Open AI gym", "Mu Jo Co", and "bullet physics", as well as specific algorithms like "PPO" and "TRPO", but it does not provide specific version numbers for any of these dependencies.
Experiment Setup	No	Hyper-parameters. Throughout the experiments, we use the same hyper-parameters across all algorithms. The learning rate is tuned for the baseline PPO, and ﬁxed across all algorithms. See Appendix F for further details.