DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations

Authors: Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, Kee-Eung Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present the empirical performance of Demo DICE and baseline methods on Mu Jo Co continuous control environments (Todorov et al., 2012) using the Open AI Gym (Brockman et al., 2016) framework. We provide experimental results for 4 Mu Jo Co environments: Hopper, Walker2d, Half Cheetah, and Ant. Our extensive evaluations show that Demo DICE achieves performance competitive to or better than a state-of-the-art off-policy IL algorithm in the offline-IL tasks with expert and imperfect demonstrations.
Researcher Affiliation Collaboration Geon-Hyeong Kim1, Seokin Seo2, Jongmin Lee1, Wonseok Jeon ,3,4, Hyeong Joo Hwang2, Hongseok Yang1,2,5, Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea 3Mila, Quebec AI Institute 4School of Computer Science, Mc Gill University 5Discrete Mathematics Group, Institute for Basic Science (IBS), Daejeon, Republic of Korea The research that is the basis of this paper was done while the author was at Mila/Mc Gill University, but the author is currently employed by Qualcomm Technologies Inc.
Pseudocode Yes Algorithm 1 Behavioral Cloning from Noisy Demonstrations Require: Noisy expert demonstrations D, policy parameters {θk}K k=1, learning rate ζ Ensure: Ensemble policy πθ = 1 K PK k=1 πθk Set ˆR(s, a) = 1 for (s, a) D. Split D into K disjoint sets {D1, D2, . . . , DK}. for iteration= 1, . . . , M do
Open Source Code Yes The code to reproduce our results is available at our Git Hub repository1. 1https://github.com/KAIST-AILab/imitation-dice
Open Datasets Yes We utilize D4RL datasets (Fu et al., 2020) to construct expert and imperfect demonstrations for our experiments. For each of Mu Jo Co environments, we utilize three types of D4RL datasets (Fu et al., 2020), whose name end with -expert-v2 , -full replay-v2 , or -random-v2 .
Dataset Splits No The paper describes the construction of datasets (e.g., M1, M2, M3 tasks with specific ratios of expert/random trajectories) and refers to training iterations, but it does not specify explicit training, validation, and test splits for the data in a way that implies a traditional supervised learning setup for model evaluation during development.
Hardware Specification No The paper mentions using MuJoCo environments and OpenAI Gym framework, but it does not specify the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies No While implementation details and hyperparameters are provided (e.g., network sizes, learning rates in Table 1), specific software dependencies with version numbers (like Python, PyTorch/TensorFlow versions) are not listed.
Experiment Setup Yes Table 1: Configurations of hyperparameters used in our experimental results. γ (discount factor) 0.99 α (regularization coefficient) 0.05 learning rate (actor) 3 10 5 network size (actor) [256, 256] batch size 256 # of training iterations 1,000,000