Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Authors: Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with deep neural networks on various robotics control simulators and on a dependency parsing sequential prediction task show that Aggre Va Te D can achieve expert-level performance and even super-expert performance when the oracle is sub-optimal, a result rarely achieved by non-interactive IL approaches. Empirical results demonstrate that by leveraging an oracle, IL can learn much faster than RL. |
| Researcher Affiliation | Academia | 1Robotics Institute, Carnegie Mellon University, USA 2Machine Learning Department, Carnegie Mellon University, USA 3College of Computing, Georgia Institute of Technology, USA. Correspondence to: Wen Sun <wensun@cs.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Aggre Va Te D (Differentiable Aggre Va Te) |
| Open Source Code | No | The paper does not explicitly provide a link to open-source code for the described methodology or state that the code is publicly available. |
| Open Datasets | Yes | We consider Cart Pole Balancing, Acrobot Swing-up, Hopper and Walker. We consider a sequential prediction problem: transition-based dependency parsing for handwritten algebra with raw image data (Duyck & Gordon, 2015). |
| Dataset Splits | Yes | The dataset consists of 400 sets of handwritten algebra equations. We use 80% for training, 10% for validation, and 10% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and tools (e.g., ADAM, REINFORCE, Mu Jo Co) but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We use a one-layer (16 hidden units) neural network with Re Lu activation functions to represent the policy for the Cart-pole and Acrobot benchmarks. We set the number of roll outs K = 50 and horizon H = 500 for Cart Pole and H = 200 for Acrobot. For the scheduling rate { i}, we set all i = 0: namely we did not roll-in using the expert s actions during training. |