reproducibilityindex.ai

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

Authors: Mengjiao Yang, Sergey Levine, Ofir Nachum

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the practicality of our objective through experiments on a set of navigation and locomotion tasks. Our results verify the beneﬁts suggested by our theory and show that TRAIL is able to improve baseline imitation learning by up to 4x in performance. 5 EXPERIMENTAL EVALUATION We now evaluate TRAIL on a set of navigation and locomotion tasks (Figure 2). Our evaluation is designed to study how well TRAIL can improve imitation learning with limited expert data by leveraging available suboptimal ofﬂine data.
Researcher Affiliation	Collaboration	Mengjiao Yang UC Berkeley, Google Brain sherryy@google.com Sergey Levine UC Berkeley, Google Brain Oﬁr Nachum Google Brain
Pseudocode	No	The paper describes the algorithm steps in text and flowcharts (Figure 1), but does not provide structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We include the challenging Ant Maze navigation tasks from D4RL (Fu et al., 2020) and low (1-Do F) to high (21-Do F) dimensional locomotaion tasks from Deep Mind Control Suite (Tassa et al., 2018). For the suboptimal data in Ant Maze, we use the full D4RL datasets antmaze-large-diverse-v0, antmaze-medium-play-v0, antmaze-medium-diverse-v0, and antmaze-medium-play-v0. For Deep Mind Control Suite (Tassa et al., 2018) set of tasks, we use the RL Unplugged (Gulcehre et al., 2020) dataset.
Dataset Splits	No	The paper describes how expert and suboptimal datasets are used for training and pretraining, e.g., 'we imitate from 1% or 2.5% of the expert datasets'. However, it does not specify explicit training/validation/test splits of these datasets or mention a distinct validation set used for hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'Swish (Ramachandran et al., 2017) activation function', but it does not specify versions for programming languages, libraries, or other software dependencies necessary for replication.
Experiment Setup	Yes	During pretraining, we use the Adam optimizer with learning rate 0.0003 for 200k iterations with batch size 256 for all methods that require pretraining. Behavioral cloning for all methods including vanilla BC is trained with learning rate 0.0001 for 1M iterations.