Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
TRUNCATED HORIZON POLICY SEARCH: COMBINING REINFORCEMENT LEARNING & IMITATION LEARNING
Authors: Wen Sun, J. Andrew Bagnell, Byron Boots
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that a gradient-based implementation of THOR can achieve superior performance compared to RL baselines and IL baselines even when the oracle is sub-optimal. ... 5 EXPERIMENTS We evaluated THOR on robotics simulators from Open AI Gym (Brockman et al., 2016). |
| Researcher Affiliation | Academia | Wen Sun Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA EMAIL J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA EMAIL Byron Boots School of Interactive Computing Georgia Institute of Technology Atlanta, GA, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Truncated Horizon Policy Search (THOR) |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code for the methodology described. |
| Open Datasets | No | The paper states that experiments were conducted on 'robotics simulators from Open AI Gym (Brockman et al., 2016)'. While Open AI Gym provides environments, it does not provide specific pre-collected datasets used for training in this paper, nor does it provide a link or citation to such a dataset. The data appears to be generated during the experiments. |
| Dataset Splits | No | The paper does not provide specific information about training/test/validation dataset splits, nor does it mention any validation sets or their proportions. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU models, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and baselines like 'Open AI Gym' and 'TRPO-GAE' but does not provide specific version numbers for any of its software dependencies. |
| Experiment Setup | No | The paper states that 'we simply use the recommended parameters in the code-base from TRPO-GAE (Schulman et al., 2016)' and that they 'did not tune any parameters except the truncation length k.' It does not explicitly list the specific hyperparameter values or detailed training configurations used in the main text. |