reproducibilityindex.ai

DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations

Authors: Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, Kee-Eung Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the empirical performance of Demo DICE and baseline methods on Mu Jo Co continuous control environments (Todorov et al., 2012) using the Open AI Gym (Brockman et al., 2016) framework. We provide experimental results for 4 Mu Jo Co environments: Hopper, Walker2d, Half Cheetah, and Ant. Our extensive evaluations show that Demo DICE achieves performance competitive to or better than a state-of-the-art off-policy IL algorithm in the ofﬂine-IL tasks with expert and imperfect demonstrations.
Researcher Affiliation	Collaboration	Geon-Hyeong Kim1, Seokin Seo2, Jongmin Lee1, Wonseok Jeon ,3,4, Hyeong Joo Hwang2, Hongseok Yang1,2,5, Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea 3Mila, Quebec AI Institute 4School of Computer Science, Mc Gill University 5Discrete Mathematics Group, Institute for Basic Science (IBS), Daejeon, Republic of Korea The research that is the basis of this paper was done while the author was at Mila/Mc Gill University, but the author is currently employed by Qualcomm Technologies Inc.
Pseudocode	Yes	Algorithm 1 Behavioral Cloning from Noisy Demonstrations Require: Noisy expert demonstrations D, policy parameters {θk}K k=1, learning rate ζ Ensure: Ensemble policy πθ = 1 K PK k=1 πθk Set ˆR(s, a) = 1 for (s, a) D. Split D into K disjoint sets {D1, D2, . . . , DK}. for iteration= 1, . . . , M do
Open Source Code	Yes	The code to reproduce our results is available at our Git Hub repository1. 1https://github.com/KAIST-AILab/imitation-dice
Open Datasets	Yes	We utilize D4RL datasets (Fu et al., 2020) to construct expert and imperfect demonstrations for our experiments. For each of Mu Jo Co environments, we utilize three types of D4RL datasets (Fu et al., 2020), whose name end with -expert-v2 , -full replay-v2 , or -random-v2 .
Dataset Splits	No	The paper describes the construction of datasets (e.g., M1, M2, M3 tasks with specific ratios of expert/random trajectories) and refers to training iterations, but it does not specify explicit training, validation, and test splits for the data in a way that implies a traditional supervised learning setup for model evaluation during development.
Hardware Specification	No	The paper mentions using MuJoCo environments and OpenAI Gym framework, but it does not specify the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	While implementation details and hyperparameters are provided (e.g., network sizes, learning rates in Table 1), specific software dependencies with version numbers (like Python, PyTorch/TensorFlow versions) are not listed.
Experiment Setup	Yes	Table 1: Conﬁgurations of hyperparameters used in our experimental results. γ (discount factor) 0.99 α (regularization coefﬁcient) 0.05 learning rate (actor) 3 10 5 network size (actor) [256, 256] batch size 256 # of training iterations 1,000,000