reproducibilityindex.ai

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Authors: Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Hangjie Shi, Suhaila Shakiah, Reza Ghanadan, William Yang Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate the efficacy of our method on the VIMA-BENCH (Jiang et al., 2023) and establish a new state-of-the-art (10% improvement in success rate). Moreover, we demonstrate that our model exhibits remarkable in-context learning ability. Project page: https://midas-icml.github.io/. We compare our methods with various baselines from the VIMA paper (Jiang et al., 2023) on the VIMA-BENCH. All baseline methods only conduct multi-task imitation learning without pretraining. We conduct extensive experiments to study how our model design and training pipeline impacts the robot manipulation, focusing on the effectiveness of our pretraining strategy and prompt encoding. We also examine the impact of data scaling and model size. Appendix A presents individual task success rate for all methods and further ablate the decoderonly architecture of our model. Appendix E studies the effectiveness of the number of gradient steps.
Researcher Affiliation	Collaboration	Work being done during internship at Amazon AGI. 1Department of Computer Science, University of California, Santa Barbara, USA 2Amazon AGI 3Department of Computer Science, University of California, Santa Cruz, USA. Correspondence to: Jiachen Li <jiachen li@ucsb.edu>.
Pseudocode	Yes	The pseudo-codes (Algorithm 1) and detailed hyper-parameters (HP) are available in Appendix B.
Open Source Code	No	The paper provides a 'Project page: https://midas-icml.github.io/', which is typically a demonstration or overview page, not explicitly stated as a code repository. There is no direct statement of code release for the methodology described in the paper.
Open Datasets	Yes	Empirically, we evaluate the efficacy of our method on the VIMA-BENCH (Jiang et al., 2023). VIMA-BENCH (Jiang et al., 2023) is built on top of the Ravens (Zeng et al., 2021; Shridhar et al., 2023) simulator and contains 17 types of tabletop manipulation tasks. Expert demonstration are provided for 13 tasks as the training data, with 50K trajectories per task.
Dataset Splits	Yes	VIMA-BENCH establishes a four-level protocol to evaluate progressively stronger generalization, ranging from placement generalization (L1), combinatorial generalization (L2), novel object generalization (L3) and novel task generalization (L4). Expert demonstration are provided for 13 tasks as the training data, with 50K trajectories per task. The other 4 tasks are included into the L4 task suite.
Hardware Specification	Yes	We conduct our experiments on cluster nodes, each with 8 NVIDIA-A10G.
Software Dependencies	No	The paper mentions software components like 'pretrained LM (T5-base)' and 'Adam W' as optimizer, but does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	The pseudo-codes (Algorithm 1) and detailed hyper-parameters (HP) are available in Appendix B. Table 19 presents the HP for our training pipeline, including Learning Rate (LR) 1e-4, Batch Size 128, Training epochs, Warmup Steps, Dropout, and Optimizer Adam W.