SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models

Authors: Shenghua Wan, Yucen Wang, Minghao Shao, Ruying Chen, De-Chuan Zhan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments part, we aim to answer the following questions: 1. How good is Se MAIL’s performance on learning tasks with complex distractors observations? ... In Figure 3, we show the performance curves of all six visual control tasks. It is clear that Se MAIL outperforms all the baseline methods and enables high sampling efficiency in most of these tasks.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology , Nanjing University, Nanjing, China. Correspondence to: De Chuan Zhan <zhandc@nju.edu.cn>.
Pseudocode Yes The pseudo-code of our proposed Se MAIL is provided in Algorithm 1.
Open Source Code Yes The codes of Se MAIL are released in https://github.com/yixiaoshenghua/Se MAIL.
Open Datasets Yes We test our algorithm on six visual control tasks, i.e., five locomotion tasks from Deep Mind Control (DMC) Suite (Tassa et al., 2018) with videos under the class driving car of the Kinetics dataset (Kay et al., 2017) as background, and another Car Racing task from Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with percentages or sample counts. It describes training and testing environments, but not the partitioning of data within them for validation.
Hardware Specification Yes We implement the proposed algorithm with Tensor Flow 2 and run all the experiments on NVIDIA RTX 3090 for about 1000 GPU hours.
Software Dependencies Yes We implement the proposed algorithm with Tensor Flow 2 and run all the experiments on NVIDIA RTX 3090 for about 1000 GPU hours.
Experiment Setup Yes The hidden sizes for the deterministic part and stochastic part are 200 and 30. We use ADAM optimizer to train the network with batches of 64 sequences of length 50. The learning rate for the task and background model is 6e-5, and for the action net, value net, and discriminator is 8e-5. We clip gradient norms to 100 to stabilize the training process. The values of background-only reconstruction λ are 1.5, 0.25, 2, 1.5, 2, and 1, respectively. The imagination horizon H for locomotion tasks is 15, and for Car Racing is 10. We initialize the dataset with 5 randomly collected episodes and train 100 iterations after collecting one episode in environments. We keep the action repeat times as 2 and set the discounting factor as 0.99 for all tasks.