TRUNCATED HORIZON POLICY SEARCH: COMBINING REINFORCEMENT LEARNING & IMITATION LEARNING

Authors: Wen Sun, J. Andrew Bagnell, Byron Boots

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate that a gradient-based implementation of THOR can achieve superior performance compared to RL baselines and IL baselines even when the oracle is sub-optimal. ... 5 EXPERIMENTS We evaluated THOR on robotics simulators from Open AI Gym (Brockman et al., 2016).
Researcher Affiliation Academia Wen Sun Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA wensun@cs.cmu.edu J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh, PA, USA dbagnell@cs.cmu.edu Byron Boots School of Interactive Computing Georgia Institute of Technology Atlanta, GA, USA bboots@cc.gatech.edu
Pseudocode Yes Algorithm 1 Truncated Horizon Policy Search (THOR)
Open Source Code No The paper does not provide an explicit statement or link to its open-source code for the methodology described.
Open Datasets No The paper states that experiments were conducted on 'robotics simulators from Open AI Gym (Brockman et al., 2016)'. While Open AI Gym provides environments, it does not provide specific pre-collected datasets used for training in this paper, nor does it provide a link or citation to such a dataset. The data appears to be generated during the experiments.
Dataset Splits No The paper does not provide specific information about training/test/validation dataset splits, nor does it mention any validation sets or their proportions.
Hardware Specification No The paper does not provide any specific hardware details such as CPU models, GPU models, or memory used for running the experiments.
Software Dependencies No The paper mentions software components and baselines like 'Open AI Gym' and 'TRPO-GAE' but does not provide specific version numbers for any of its software dependencies.
Experiment Setup No The paper states that 'we simply use the recommended parameters in the code-base from TRPO-GAE (Schulman et al., 2016)' and that they 'did not tune any parameters except the truncation length k.' It does not explicitly list the specific hyperparameter values or detailed training configurations used in the main text.