Bayesian Imitation Learning for End-to-End Mobile Manipulation
Authors: Yuqing Du, Daniel Ho, Alex Alemi, Eric Jang, Mohi Khansari
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. ... In a real-world office environment, we achieve 96% task success, improving upon the baseline by +16%. (Abstract) and Section 5. Experiments (Section Title) |
| Researcher Affiliation | Collaboration | 1UC Berkeley, work done while author was at Everyday Robots 2Everyday Robots 3Google Research 4Work done while author was at Google. Correspondence to: Yuqing Du <yuqing du@berkeley.edu>. |
| Pseudocode | No | The paper describes the methods and formulations using mathematical equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | Our training dataset consists of a real-world dataset of 2068 demonstrations (≈ 13.5 hours) and a simulated dataset of 500 demonstrations (≈ 2.7 hours), all collected using handheld teleoperation devices. The paper describes the datasets used for training but does not provide concrete access information (e.g., a link, DOI, or specific repository name) for these datasets, implying they are not publicly available. |
| Dataset Splits | No | The paper mentions using 'simulated evaluations' to determine suitable models for real-world testing and states 'Six of the doors are in the training dataset and four are only used during evaluation entirely unseen during training.' While this implies a validation process and a distinction between training and test tasks/instances, it does not provide explicit percentages or sample counts for a distinct validation split of the overall demonstration dataset used for model training. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, memory, or cloud computing instance specifications. |
| Software Dependencies | No | The paper mentions general software components like 'ResNet-18' and 'MLP' for model architecture, but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For a single input image s, we can decompose the training loss as L = Ez p(z|s) [ log q(a|z)] | {z } LBC +β Ez p(z|s) [ DKL[p(z|x)||r(z)]] | {z } LKL (4) ... The rate is weighted by β and controls the bottlenecking tradeoff. ... we find it helpful to linearly anneal β from 0 within the first 3000 steps of training. ... We parameterize the stochastic encoder p(z|s) using a Res Net-18 (He et al., 2015) that predicts the mean and covariance of a multivariate Gaussian distribution on R64. The action decoder q(a|z) is a 2-layer MLP, and the learned prior r(z) is a mixture of multivariate Gaussians on R64 with 512 components and a learnable mean and diagonal covariance matrix. ... During training we take 8 samples from the stochastic embedding per input and compute the average VIB rate loss. |