reproducibilityindex.ai

Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation

Authors: Christopher R. Dance, Julien Perez, Théo Cachet

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results on robotic manipulation and navigation benchmarks, demonstrating DCRL's superior performance compared with state-of-the-art alternatives, as well as its ability to improve on suboptimal demonstrations and to cope with domain shifts.
Researcher Affiliation	Industry	1NAVER LABS Europe, 6 chemin de Maupertuis, Meylan, 38240, France. Website:europe.naverlabs.com. Correspondence to: Théo Cachet <theo.cachet@naverlabs.com>.
Pseudocode	Yes	Algorithm 1 Demonstration-conditioned reinforcement learning
Open Source Code	No	The paper cites 'Meta-World source code' for the benchmark used (Yu et al., 2019c), but it does not provide any statement or link indicating that the authors' own DCRL implementation code is open-source or publicly available.
Open Datasets	Yes	We use Meta-World, a robotic manipulation benchmark, originally designed to assess the performance of metalearning algorithms (Yu et al., 2019b). ... Our second benchmark involves 60 tasks, each corresponding to a maze layout. ... the transition function is computed with Viz Doom (Kempka et al., 2016).
Dataset Splits	No	The paper describes training and testing splits for tasks (e.g., 'trained on 45 tasks and tested on 5 hold-out tasks' for Meta-World, 'train on a ﬁxed set of 50 mazes and test on the remaining 10 mazes' for Navigation), but it does not explicitly mention or provide details for a separate 'validation' dataset split for hyperparameter tuning or early stopping within the main text.
Hardware Specification	Yes	It takes about one day to train DCRL for both benchmarks, using a Tesla V100 GPU. ... On this benchmark, using an Nvidia 2080 Ti GPU, the execution time of our transformer-based architecture is as follows.
Software Dependencies	No	The paper mentions using PPO, Meta-World, MuJoCo, and Viz Doom but does not specify the version numbers for any of these software dependencies, which are necessary for full reproducibility.
Experiment Setup	No	While the paper describes aspects of the training process (e.g., sampling 5000 demonstrations per task, training for 250 million environment frames, using PPO), it explicitly states that 'Full details can be found in the Supplementary Material' regarding hyperparameters, implying specific setup values like learning rates or batch sizes are not provided within the main text.