Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Visual Adversarial Imitation Learning using Variational Models
Authors: Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments involving several vision-based locomotion and manipulation tasks, we find that V-MAIL learns successful visuomotor policies in a sample-efficient manner, has better stability compared to prior work, and also achieves higher asymptotic performance. |
| Researcher Affiliation | Collaboration | Rafael Rafailov1 Tianhe Yu1 Aravind Rajeswaran2,3 Chelsea Finn1 EMAIL, EMAIL 1 Stanford University, 2 University of Washington, 3 Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 V-MAIL: Variational Model-Based Adversarial Imitation Learning |
| Open Source Code | No | All results including videos can be found online at https://sites.google.com/view/variational-mail. |
| Open Datasets | Yes | These consist of two locomotion environments from the Deep Mind Control Suite [30], the classic Car Racing environment from Open AI Gym [31] and two dexterous manipulation tasks using the D Claw [32] and Shadow Hand platforms. |
| Dataset Splits | No | The agent is provided with a fixed set of expert demonstrations collected by executing an expert policy πE, which we assume is optimal under the unknown reward function. |
| Hardware Specification | Yes | All experiments were carried out on a single Titan RTX GPU using an internal cluster for about 1000 GPU hours. |
| Software Dependencies | No | For the former, we choose DAC [18] as a representative approach, which we equip with Dr Q data augmentation for greater performance on vision-based tasks. |
| Experiment Setup | No | For implementation details, see the appendix. |