Taylor Expansion of Discount Factors
Authors: Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the empirical performance of new algorithmic changes to the baseline algorithms. We focus on robotics control experiments with continuous state and action space. The tasks are available in Open AI gym (Brockman et al., 2016), with backends such as Mu Jo Co (Todorov et al., 2012) and bullet physics (Coumans, 2015). We label the tasks as gym (G) and bullet (B) respectively. We always compare the undiscounted cumulative rewards evaluated under a default evaluation horizon T = 1000. |
| Researcher Affiliation | Collaboration | Yunhao Tang 1 Columbia University, New York, USA 2Deep Mind, London, UK 3Deep Mind, Paris, France. |
| Pseudocode | Yes | Algorithm 1 Estimating the Kth order expansion; Algorithm 2 Taylor expansion Q-function estimation; Algorithm 3 Taylor expansion update weighting. |
| Open Source Code | No | The paper does not provide any specific repository links or explicit statements about the release of source code for the methodology described. |
| Open Datasets | Yes | The tasks are available in Open AI gym (Brockman et al., 2016), with backends such as Mu Jo Co (Todorov et al., 2012) and bullet physics (Coumans, 2015). |
| Dataset Splits | No | The paper mentions running experiments across "5 seeds" but does not provide specific dataset split information (e.g., percentages, sample counts, or predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions "Google Cloud Platform for computational resources" in the acknowledgements, but this is a general cloud platform and does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and environments like "Open AI gym", "Mu Jo Co", and "bullet physics", as well as specific algorithms like "PPO" and "TRPO", but it does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | No | Hyper-parameters. Throughout the experiments, we use the same hyper-parameters across all algorithms. The learning rate is tuned for the baseline PPO, and fixed across all algorithms. See Appendix F for further details. |