SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models
Authors: Shenghua Wan, Yucen Wang, Minghao Shao, Ruying Chen, De-Chuan Zhan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments part, we aim to answer the following questions: 1. How good is Se MAIL’s performance on learning tasks with complex distractors observations? ... In Figure 3, we show the performance curves of all six visual control tasks. It is clear that Se MAIL outperforms all the baseline methods and enables high sampling efficiency in most of these tasks. |
| Researcher Affiliation | Academia | 1National Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, China. Correspondence to: De Chuan Zhan <zhandc@nju.edu.cn>. |
| Pseudocode | Yes | The pseudo-code of our proposed Se MAIL is provided in Algorithm 1. |
| Open Source Code | Yes | The codes of Se MAIL are released in https://github.com/yixiaoshenghua/Se MAIL. |
| Open Datasets | Yes | We test our algorithm on six visual control tasks, i.e., five locomotion tasks from Deep Mind Control (DMC) Suite (Tassa et al., 2018) with videos under the class driving car of the Kinetics dataset (Kay et al., 2017) as background, and another Car Racing task from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages or sample counts. It describes training and testing environments, but not the partitioning of data within them for validation. |
| Hardware Specification | Yes | We implement the proposed algorithm with Tensor Flow 2 and run all the experiments on NVIDIA RTX 3090 for about 1000 GPU hours. |
| Software Dependencies | Yes | We implement the proposed algorithm with Tensor Flow 2 and run all the experiments on NVIDIA RTX 3090 for about 1000 GPU hours. |
| Experiment Setup | Yes | The hidden sizes for the deterministic part and stochastic part are 200 and 30. We use ADAM optimizer to train the network with batches of 64 sequences of length 50. The learning rate for the task and background model is 6e-5, and for the action net, value net, and discriminator is 8e-5. We clip gradient norms to 100 to stabilize the training process. The values of background-only reconstruction λ are 1.5, 0.25, 2, 1.5, 2, and 1, respectively. The imagination horizon H for locomotion tasks is 15, and for Car Racing is 10. We initialize the dataset with 5 randomly collected episodes and train 100 iterations after collecting one episode in environments. We keep the action repeat times as 2 and set the discounting factor as 0.99 for all tasks. |