Meta-Imitation Learning by Watching Video Demonstrations

Authors: Jiayi Li, Tao Lu, Xiaoge Cao, Yinghao Cai, Shuo Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS The aims of our experimental evaluation are to answer the following questions: 1. Can our approach allow the robot to learn new skills by observing only one human demonstration without any robot instruction during training? 2. How does our approach compare to the mainstream meta-imitation baseline that requires robot demonstrations? 3. In A-Cycle GAN, what is the impact of coupled training the generative model and the inverse dynamic model? 4. How important is it that the meta-imitation module takes the latent states as input? 5. What benefits do we gain from the adaptive loss? Our task is challenging: it lacks a significant amount of accurate information about the skills throughout the training. Especially the latent states and the corresponding actions used for metapolicy training are all estimated. In our evaluation, we design two robot manipulation skills: shape-drawing and pushing. These two skills are also challenging: The shape-drawing skill needs the robot to follow the designated shape lines accurately, which has not appeared in prior meta-learning experiments. The pushing skill needs robot to push the target object to the goal under the new distractor (different from demonstrations) in the meta-test phase, which would be more effective in verifying the generalization of the meta-policy. It is also harder than the assignment in DAML (Yu et al., 2018b) in which the robot only needs to push the target away under the same distractor as in demonstrations. To our knowledge, we are the first to propose the meta-imitation learning method using only human video demonstrations during meta-training (referred to as MILV). Consequently, we compare our method with the meta-imitation learning baseline DAML which could learn the new task from a single video but require robot demonstration during meta-training. We also compare with our ablations to understand the complementary performance of each module, which includes decoupled training of the generative model and the inverse dynamic model (referred to as De GI); directly feeding the meta-imitation module with the imagined robot images (Feed Img), and updating the meta-imitation module with the stable loss (WStable).
Researcher Affiliation Academia 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences 3Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences {lijiayi2019,tao.lu,caoxiaoge2020,yinghao.cai,shuo.wang}@ia.ac.cn
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. The methods are described textually and with diagrams.
Open Source Code No The paper does not provide any explicit statement about making its source code available or a link to a code repository.
Open Datasets No The paper describes datasets they collected (e.g., 'The training set consists of 1,320 Sawyer robot demonstrations...', 'The datasets contain 1,000 Sawyer demonstrations...', 'we collected a dataset with 11 objects...'). While they mention using 'Open AI Gym' and 'Mu Jo Co' environments, they do not provide specific access information (links, DOIs, or citations with authors/years for public availability) for the *collected demonstration datasets* they use for training.
Dataset Splits Yes During meta-training, MAML samples a task T and data from DT , which are randomly partitioned into two sets, Dtr and Dval. ... we separate the ˆlr into two sets, ˆlr tr and ˆlr val.
Hardware Specification No The paper mentions robots (e.g., '7-DoF Sawyer robot', '7-DoF Fetch robot', 'UR5 robotic arm') and cameras ('Real Sense D455 camera') used for the experimental setup. However, it does not specify any computing hardware such as specific CPU or GPU models used for training or inference.
Software Dependencies No The paper mentions 'Open AI Gym environment' and 'Mu Jo Co physics engine' as tools used. However, it does not provide specific version numbers for these or any other software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA) that would be needed to reproduce the experiments.
Experiment Setup Yes In the experiment, the input to the policy is only sequences of 128 128 RGB pixels images, without any information on the robot joint or end-effector. The policy output is the incremental movement of the robot end-effector in 3D space, as ( x, y, z), where x [0, 5cm], y [0, 5cm], and z = 0cm (z is set as a constant 0.3cm above the table). For shape-drawing tasks, all shapes to be depicted are shown in Figure 7. ... We set λ = 10 and the batch size is 1 in all experiments. These networks are trained from scratch with a learning rate of 0.0002. ... Throughout the training, the learning rate remains constant at 0.001. ... The policy used 1 meta-gradient update with the step size α = 0.01.