Multi-View Masked World Models for Visual Robotic Manipulation
Authors: Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate MV-MWM on challenging visual robotic manipulation tasks from RLBench (James et al., 2020) a standard benchmark for vision-based robotics which has been shown to serve as a proxy for real-robot experiments (James & Davison, 2022). Furthermore, we evaluate the zero-shot performance of our method on real-world by transferring the trained agent to control real-robots. |
| Researcher Affiliation | Collaboration | Younggyo Seo * 1 Junsu Kim * 1 Stephen James 2 Kimin Lee 3 Jinwoo Shin 1 Pieter Abbeel 4 *Equal contribution 1KAIST 2Dyson Robot Learning Lab 3Google Research 4UC Berkeley. Correspondence to: Younggyo Seo <younggyo.seo@kaist.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 Multi-View Masked World Models. Key differences to MWM (Seo et al., 2022a) in gray. |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is released. It mentions 'We build our implementation upon the official implementation of MWM (Seo et al., 2022a)' but this is not a direct release statement for their own code. |
| Open Datasets | Yes | We evaluate MV-MWM on challenging visual robotic manipulation tasks from RLBench (James et al., 2020) a standard benchmark for vision-based robotics which has been shown to serve as a proxy for real-robot experiments (James & Davison, 2022). |
| Dataset Splits | No | The paper mentions 'evaluated using the last five model checkpoints' and 'stratified bootstrap confidence interval' but does not provide specific details on train/validation/test splits for the datasets themselves, such as percentages or sample counts. |
| Hardware Specification | Yes | We use 24 CPU cores (Intel Xeon CPU @ 2.2GHz) and 1 GPU (NVIDIA A100 40GB GPU) for our experiments. |
| Software Dependencies | No | The paper mentions 'tfimm2 library' and 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for these software components or any other key libraries. |
| Experiment Setup | Yes | We use the same set of hyperparameters for all experiments. We provide more implementation details in Appendix A. Hyperparameters We report the hyperparameters used in our experiments in Table 2. Hyperparameter Value Autoencoder learning rate 3 10 4. Autoencoder masking ratio 0.95. Autoencoder Vi T encoder size 8 layers, 4 heads, 256 units. World model batch size 36. |