Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Authors: Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with deep neural networks on various robotics control simulators and on a dependency parsing sequential prediction task show that Aggre Va Te D can achieve expert-level performance and even super-expert performance when the oracle is sub-optimal, a result rarely achieved by non-interactive IL approaches. Empirical results demonstrate that by leveraging an oracle, IL can learn much faster than RL. |
| Researcher Affiliation | Academia | 1Robotics Institute, Carnegie Mellon University, USA 2Machine Learning Department, Carnegie Mellon University, USA 3College of Computing, Georgia Institute of Technology, USA. Correspondence to: Wen Sun <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Aggre Va Te D (Differentiable Aggre Va Te) |
| Open Source Code | No | The paper does not explicitly provide a link to open-source code for the described methodology or state that the code is publicly available. |
| Open Datasets | Yes | We consider Cart Pole Balancing, Acrobot Swing-up, Hopper and Walker. We consider a sequential prediction problem: transition-based dependency parsing for handwritten algebra with raw image data (Duyck & Gordon, 2015). |
| Dataset Splits | Yes | The dataset consists of 400 sets of handwritten algebra equations. We use 80% for training, 10% for validation, and 10% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and tools (e.g., ADAM, REINFORCE, Mu Jo Co) but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We use a one-layer (16 hidden units) neural network with Re Lu activation functions to represent the policy for the Cart-pole and Acrobot benchmarks. We set the number of roll outs K = 50 and horizon H = 500 for Cart Pole and H = 200 for Acrobot. For the scheduling rate { i}, we set all i = 0: namely we did not roll-in using the expert s actions during training. |