reproducibilityindex.ai

VIMA: Robot Manipulation with Multimodal Prompts

Authors: Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to 2.9 task success rate given the same training data. With 10 less training data, VIMA still performs 2.7 better than the best competing variant.
Researcher Affiliation	Collaboration	1Stanford University; 2Macalester College, now at Allen Institute for AI; 3NVIDIA; 4Caltech; 5Tsinghua; 6UT Austin. Work done during the first author s internship at NVIDIA.
Pseudocode	Yes	Pseudocode 1: Cross-attention operation that conditions the trajectory history on prompt. We repetitively alternate crossattention and self-attention to model the trajectory given a specific task.
Open Source Code	Yes	Code and video demos are available at vimalabs.github.io.
Open Datasets	Yes	We open-source the simulation environment, training dataset, algorithm code, and pre-trained model checkpoints to ensure reproducibility and facilitate future work from the community. These materials along with video demos are available at vimalabs.github.io.
Dataset Splits	Yes	After training, we select model checkpoints for evaluation based on the aggregated accuracy on a held-out validation set.
Hardware Specification	Yes	All experiments are conducted on cluster nodes, each with 8 NVIDIA V100 GPUs.
Software Dependencies	Yes	We implement all models in Py Torch (Paszke et al., 2019) and adapt Transformer-related implementation from Wolf et al. (2019).
Experiment Setup	Yes	Training hyperparameters are provided in Table 7.