Domain Adaptive Imitation Learning with Visual Observation
Authors: Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift. |
| Researcher Affiliation | Collaboration | 1KAIST 2UNIST 3Carnegie Mellon University 4LG AI Research |
| Pseudocode | Yes | Appendix A provides the details of loss functions implementing D3IL, and Algorithms 1 and 2 in Appendix B summarise our domain-adaptive IL algorithm. |
| Open Source Code | No | The paper states: "Our D3IL is implemented based on the code provided by the authors of [6]." This indicates they built upon existing code, but does not state that their own implementation or code for the described methodology is open-source or available. |
| Open Datasets | No | OSE is obtained by the expert policy πE, which is trained for 1 million timesteps in the source domain using SAC. OSN is obtained by a policy taking uniformly random actions in the source domain, and OT N is obtained by a policy taking uniformly random actions in the target domain. ... For each IL task, the number of observations (i.e., the number of timesteps) for OSE, OSN, OT N is ndemo = 10000 except for Half Cheetah-to-locked-legs task, where ndemo = 20000 for this task. |
| Dataset Splits | No | The paper describes generating and sampling data for training and evaluation. However, it does not specify explicit train/validation/test dataset splits with percentages or fixed sample counts. It's an online learning setup with continuous data generation. |
| Hardware Specification | Yes | The proposed method is implemented on Tensor Flow 2.0 with CUDA 10.0, and we used two Intel Xeon CPUs and a TITAN Xp GPU as the main computing resources. One can use a Ge Force RTX 3090 GPU instead as the computing resource, which requires Tensor Flow 2.5 and CUDA 11.4. |
| Software Dependencies | Yes | The proposed method is implemented on Tensor Flow 2.0 with CUDA 10.0, and we used two Intel Xeon CPUs and a TITAN Xp GPU as the main computing resources. One can use a Ge Force RTX 3090 GPU instead as the computing resource, which requires Tensor Flow 2.5 and CUDA 11.4. |
| Experiment Setup | Yes | In the first phase, we trained the model for nepoch_it = 50000 epochs for all IL tasks... We set λpred feat = λadv feat = 0.01, λreg feat = 0.1, λadv img = 1, λsim feat = λrecon feat = 1000, λrecon img = λcycle img = 100000. ... We used Adam optimizer [23] for optimizing all networks. We set the learning rate lr = 0.001 and momentum parameters β1 = 0.9 and β2 = 0.999 for encoders, generators, and discriminators. We set lr = 0.0003, β1 = 0.9 and β2 = 0.999 for the actor and critic. For SAC, we set the discount factor γ = 0.99, the parameter ρ = 0.995 for Polyak averaging the Q-network, and the entropy coefficient α = 0.1. |