Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Authors: Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments with deep neural networks on various robotics control simulators and on a dependency parsing sequential prediction task show that Aggre Va Te D can achieve expert-level performance and even super-expert performance when the oracle is sub-optimal, a result rarely achieved by non-interactive IL approaches. Empirical results demonstrate that by leveraging an oracle, IL can learn much faster than RL.
Researcher Affiliation Academia 1Robotics Institute, Carnegie Mellon University, USA 2Machine Learning Department, Carnegie Mellon University, USA 3College of Computing, Georgia Institute of Technology, USA. Correspondence to: Wen Sun <wensun@cs.cmu.edu>.
Pseudocode Yes Algorithm 1 Aggre Va Te D (Differentiable Aggre Va Te)
Open Source Code No The paper does not explicitly provide a link to open-source code for the described methodology or state that the code is publicly available.
Open Datasets Yes We consider Cart Pole Balancing, Acrobot Swing-up, Hopper and Walker. We consider a sequential prediction problem: transition-based dependency parsing for handwritten algebra with raw image data (Duyck & Gordon, 2015).
Dataset Splits Yes The dataset consists of 400 sets of handwritten algebra equations. We use 80% for training, 10% for validation, and 10% for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms and tools (e.g., ADAM, REINFORCE, Mu Jo Co) but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We use a one-layer (16 hidden units) neural network with Re Lu activation functions to represent the policy for the Cart-pole and Acrobot benchmarks. We set the number of roll outs K = 50 and horizon H = 500 for Cart Pole and H = 200 for Acrobot. For the scheduling rate { i}, we set all i = 0: namely we did not roll-in using the expert s actions during training.