Any2Policy: Learning Visuomotor Policy with Any-Modality
Authors: Yichen Zhu, Zhicai Ou, Feifei Feng, Jian Tang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive validation of our proposed unified modality embodied agent using several simulation benchmarks, including Franka Kitchen and Maniskill2, as well as in our real-world settings. Our experiments showcase the promising capability of building embodied agents that can adapt to diverse multi-modal in a unified framework. |
| Researcher Affiliation | Industry | Yichen Zhu, Zhicai Ou, Feifei Feng, Jian Tang Midea Group |
| Pseudocode | No | The paper contains architectural diagrams (Figure 1, Figure 2) but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like procedures. |
| Open Source Code | Yes | Our project is at any2policy.github.io/. The data will be attached in the webpage. |
| Open Datasets | Yes | In support of this project, we are releasing a substantial real-world dataset consisting of 30 tasks, where each task includes 30 trajectories, all annotated with multi-modal instructions and observations, mirroring the setup used in our experiments. The purpose of this dataset is to foster and encourage future research in the area of multi-modal embodied agents. Our dataset, Robo Any, stands out as the first to support a comprehensive range of modalities in robotics. Specifically, Franka Kitchen [92] uses text-image and Mani Skill2 [94] uses text-image and text-{image, point cloud}. The data will be attached in the webpage. |
| Dataset Splits | Yes | The dataset is divided into training, validation, and testing subsets, with a split of 7/1/2, respectively. |
| Hardware Specification | Yes | All models are trained on A100 GPUs, implemented in Py Torch [111]. We report the computer resources. |
| Software Dependencies | No | All models are trained on A100 GPUs, implemented in Py Torch [111]. |
| Experiment Setup | Yes | We use an initial learning rate of 3e-5 with the Adam W [107] optimizer, a weight decay of 1e-6, and a linearly decaying learning rate scheduler with a warm-up covering the initial 2% of the total training time [108]. We apply a gradient clipping of 1.0. The Franka-Kitchen are trained for 40K steps. We use weight decay of 1e-6, cosine learning rate scheduler with warmup steps of 2% total steps. The gradient clip of 1.0 is also applied. We use Adam optimizer with initial learning of 1e-3 and 3e-4 for Franka Kitchen and Maniskill-2. |