Domain Adaptive Imitation Learning with Visual Observation

Authors: Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift.
Researcher Affiliation Collaboration 1KAIST 2UNIST 3Carnegie Mellon University 4LG AI Research
Pseudocode Yes Appendix A provides the details of loss functions implementing D3IL, and Algorithms 1 and 2 in Appendix B summarise our domain-adaptive IL algorithm.
Open Source Code No The paper states: "Our D3IL is implemented based on the code provided by the authors of [6]." This indicates they built upon existing code, but does not state that their own implementation or code for the described methodology is open-source or available.
Open Datasets No OSE is obtained by the expert policy πE, which is trained for 1 million timesteps in the source domain using SAC. OSN is obtained by a policy taking uniformly random actions in the source domain, and OT N is obtained by a policy taking uniformly random actions in the target domain. ... For each IL task, the number of observations (i.e., the number of timesteps) for OSE, OSN, OT N is ndemo = 10000 except for Half Cheetah-to-locked-legs task, where ndemo = 20000 for this task.
Dataset Splits No The paper describes generating and sampling data for training and evaluation. However, it does not specify explicit train/validation/test dataset splits with percentages or fixed sample counts. It's an online learning setup with continuous data generation.
Hardware Specification Yes The proposed method is implemented on Tensor Flow 2.0 with CUDA 10.0, and we used two Intel Xeon CPUs and a TITAN Xp GPU as the main computing resources. One can use a Ge Force RTX 3090 GPU instead as the computing resource, which requires Tensor Flow 2.5 and CUDA 11.4.
Software Dependencies Yes The proposed method is implemented on Tensor Flow 2.0 with CUDA 10.0, and we used two Intel Xeon CPUs and a TITAN Xp GPU as the main computing resources. One can use a Ge Force RTX 3090 GPU instead as the computing resource, which requires Tensor Flow 2.5 and CUDA 11.4.
Experiment Setup Yes In the first phase, we trained the model for nepoch_it = 50000 epochs for all IL tasks... We set λpred feat = λadv feat = 0.01, λreg feat = 0.1, λadv img = 1, λsim feat = λrecon feat = 1000, λrecon img = λcycle img = 100000. ... We used Adam optimizer [23] for optimizing all networks. We set the learning rate lr = 0.001 and momentum parameters β1 = 0.9 and β2 = 0.999 for encoders, generators, and discriminators. We set lr = 0.0003, β1 = 0.9 and β2 = 0.999 for the actor and critic. For SAC, we set the discount factor γ = 0.99, the parameter ρ = 0.995 for Polyak averaging the Q-network, and the entropy coefficient α = 0.1.