reproducibilityindex.ai

Imitation Learning from Purified Demonstrations

Authors: Yunke Wang, Minjing Dong, Yukun Zhao, Bo Du, Chang Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on Mu Jo Co and Robo Suite demonstrate the effectiveness of our method from different aspects.In this section, we conduct extensive experiments to verify the effectiveness of DP-IL in Mu Jo Co (Todorov et al., 2012) and Robosuite (Zhu et al., 2020) with different compared methods. The experimental results demonstrate the advantage of DP-IL from different aspects.
Researcher Affiliation	Academia	1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Wuhan Institute of Data Intelligence, Wuhan University, China. 2Department of Computer Science, City University of Hong Kong, China. 3School of Computer Science, Faculty of Engineering, The University of Sydney, Australia.
Pseudocode	Yes	The pseudo code of diffusion model s training and purification is available in Algorithm 1 and Algorithm 2.
Open Source Code	Yes	Our source code and training data will be available at https://github.com/yunke-wang/dp-il.
Open Datasets	Yes	We first conduct experiments on Mu Jo Co benchmarks in Open AI Gym (Brockman et al., 2016).We also evaluate the robustness of DP-BC on the Robo Suite platform (Zhu et al., 2020) with real-world demonstrations.We use real-world demonstrations by human operators from Robo Turk (Mandlekar et al., 2018).
Dataset Splits	No	The paper uses optimal and sub-optimal demonstrations for training but does not explicitly state specific training/validation/test dataset splits, percentages, or sample counts needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions implementing algorithms based on a DDPM repository and using TRPO, but it does not provide specific version numbers for software components, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	The training epoch is set to be 10000 and the learning rate of ϵϕ is set to 1e-4. We set N = 1000 for all experiments and set the forward process variances to constants increasing linearly from β1 = 1e 4 to βN = 0.02. ...the policy is trained with batch size 256, and the total epoch is set to be 1000. For online imitation learning, the learning rate of the discriminator Dψ and the critic rψ is set to 3 10 4. ...The discount rate γ of the sampled trajectory is set to 0.995. The τ (GAE parameter) is set to 0.97.