reproducibilityindex.ai

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Authors: Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with deep neural networks on various robotics control simulators and on a dependency parsing sequential prediction task show that Aggre Va Te D can achieve expert-level performance and even super-expert performance when the oracle is sub-optimal, a result rarely achieved by non-interactive IL approaches. Empirical results demonstrate that by leveraging an oracle, IL can learn much faster than RL.
Researcher Affiliation	Academia	1Robotics Institute, Carnegie Mellon University, USA 2Machine Learning Department, Carnegie Mellon University, USA 3College of Computing, Georgia Institute of Technology, USA. Correspondence to: Wen Sun <wensun@cs.cmu.edu>.
Pseudocode	Yes	Algorithm 1 Aggre Va Te D (Differentiable Aggre Va Te)
Open Source Code	No	The paper does not explicitly provide a link to open-source code for the described methodology or state that the code is publicly available.
Open Datasets	Yes	We consider Cart Pole Balancing, Acrobot Swing-up, Hopper and Walker. We consider a sequential prediction problem: transition-based dependency parsing for handwritten algebra with raw image data (Duyck & Gordon, 2015).
Dataset Splits	Yes	The dataset consists of 400 sets of handwritten algebra equations. We use 80% for training, 10% for validation, and 10% for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms and tools (e.g., ADAM, REINFORCE, Mu Jo Co) but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We use a one-layer (16 hidden units) neural network with Re Lu activation functions to represent the policy for the Cart-pole and Acrobot benchmarks. We set the number of roll outs K = 50 and horizon H = 500 for Cart Pole and H = 200 for Acrobot. For the scheduling rate { i}, we set all i = 0: namely we did not roll-in using the expert s actions during training.