reproducibilityindex.ai

Learning to Weight Imperfect Demonstrations

Authors: Yunke Wang, Chang Xu, Bo Du, Honglak Lee

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in the Mujoco and Atari environments demonstrate that the proposed algorithm outperforms baseline methods in handling imperfect expert demonstrations.
Researcher Affiliation	Collaboration	1National Engineering Research Center for Multimedia Software, Institute of Artiﬁcial Intelligence, School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China 2School of Computer Science, Faculty of Engineering, The University of Sydney, Australia 3EECS Department, University of Michingan, USA 4LG AI Research, South Korea.
Pseudocode	No	The paper describes the methodology but does not include any explicit pseudocode blocks or clearly labeled algorithm sections.
Open Source Code	No	The paper cites a third-party implementation (Kostrikov's PPO) that was used, but does not provide an explicit statement or link to the source code for their proposed method (WGAIL).
Open Datasets	Yes	We ﬁrst conduct experiments on four continuous control tasks in the Mujoco simulator (Todorov et al., 2012): Antv2, Hopper-v2, Walker2d-v2, and Half Cheetah-v2. ... we only evaluate WGAIL on ﬁve Atari games Beamrider, Pong, Qbert, Seaquest and Hero with one kind of imperfect demonstrations.
Dataset Splits	No	The paper discusses training and testing but does not explicitly mention or specify details about a validation dataset split.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper refers to 'Kostrikov’s implementation of PPO' and mentions PyTorch in a citation, but does not provide specific version numbers for software dependencies used in their experiments.
Experiment Setup	No	The paper mentions evaluating with 'five different random seeds' and using 'default hyperparameter' from a third-party PPO implementation, but does not provide specific values for hyperparameters or detailed training configurations (e.g., learning rate, batch size, number of epochs) for their own experiments.