reproducibilityindex.ai

Self-Adaptive Imitation Learning: Learning Tasks with Delayed Rewards from Sub-optimal Demonstrations

Authors: Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou9269-9277

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results show that not only does SAIL signiﬁcantly improve the sample efﬁciency, but it also leads to higher asymptotic performance across different continuous control tasks, compared with the state-of-the-art. In this section, we study how SAIL achieves the objective of imitation learning and exploration in an environment with delayed rewards. Extensive experiments have been conducted to answer the following key questions:
Researcher Affiliation	Collaboration	Zhuangdi Zhu,1 Kaixiang Lin, 1 Bo Dai, 2 Jiayu Zhou 1 1 Michigan State University 2 Google Brain
Pseudocode	Yes	Algorithm 1: Self-Adaptive Imitation Learning
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing their code or a direct link to their implementation of SAIL. It references third-party libraries: "we built SAIL on a TD3 framework (Fujimoto, Van Hoof, and Meger 2018) based on stable-baselines1 implementations." with footnote 1 pointing to https://stable-baselines.readthedocs.io/en/master/.
Open Datasets	Yes	It is tested on 4 popular Mu Jo Co2 tasks: Walker2d-v2, Hopper-v2, Half Cheetah-v2, and Swimmerv2. For each task, we generate teacher demonstrations from a deterministic policy that was pre-trained to be sub-optimal.
Dataset Splits	No	The paper mentions that "All experiments are conducted using one imperfect demonstration trajectory on ﬁve random seeds" and "Models are evaluated after training using 10^6 interaction samples." However, it does not specify explicit train/validation/test splits of a dataset in percentages or counts for reproducibility, which is common in supervised learning. For RL, evaluation is typically done by running policies in the environment.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper states: "we built SAIL on a TD3 framework (Fujimoto, Van Hoof, and Meger 2018) based on stable-baselines1 implementations." While it mentions 'stable-baselines', it does not provide a specific version number for this library or any other software dependencies like Python, PyTorch/TensorFlow, or CUDA, which are necessary for full reproducibility.
Experiment Setup	No	The paper mentions: "All experiments are conducted using one imperfect demonstration trajectory on ﬁve random seeds, with each trajectory containing no more than 1000 transitions. Models are evaluated after training using 10^6 interaction samples." While it provides some setup details like the number of random seeds and interaction samples, it lacks crucial hyperparameters such as learning rates, batch sizes, optimizer details, or specific network architectures for full reproducibility.