reproducibilityindex.ai

Any2Policy: Learning Visuomotor Policy with Any-Modality

Authors: Yichen Zhu, Zhicai Ou, Feifei Feng, Jian Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive validation of our proposed unified modality embodied agent using several simulation benchmarks, including Franka Kitchen and Maniskill2, as well as in our real-world settings. Our experiments showcase the promising capability of building embodied agents that can adapt to diverse multi-modal in a unified framework.
Researcher Affiliation	Industry	Yichen Zhu, Zhicai Ou, Feifei Feng, Jian Tang Midea Group
Pseudocode	No	The paper contains architectural diagrams (Figure 1, Figure 2) but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like procedures.
Open Source Code	Yes	Our project is at any2policy.github.io/. The data will be attached in the webpage.
Open Datasets	Yes	In support of this project, we are releasing a substantial real-world dataset consisting of 30 tasks, where each task includes 30 trajectories, all annotated with multi-modal instructions and observations, mirroring the setup used in our experiments. The purpose of this dataset is to foster and encourage future research in the area of multi-modal embodied agents. Our dataset, Robo Any, stands out as the first to support a comprehensive range of modalities in robotics. Specifically, Franka Kitchen [92] uses text-image and Mani Skill2 [94] uses text-image and text-{image, point cloud}. The data will be attached in the webpage.
Dataset Splits	Yes	The dataset is divided into training, validation, and testing subsets, with a split of 7/1/2, respectively.
Hardware Specification	Yes	All models are trained on A100 GPUs, implemented in Py Torch [111]. We report the computer resources.
Software Dependencies	No	All models are trained on A100 GPUs, implemented in Py Torch [111].
Experiment Setup	Yes	We use an initial learning rate of 3e-5 with the Adam W [107] optimizer, a weight decay of 1e-6, and a linearly decaying learning rate scheduler with a warm-up covering the initial 2% of the total training time [108]. We apply a gradient clipping of 1.0. The Franka-Kitchen are trained for 40K steps. We use weight decay of 1e-6, cosine learning rate scheduler with warmup steps of 2% total steps. The gradient clip of 1.0 is also applied. We use Adam optimizer with initial learning of 1e-3 and 3e-4 for Franka Kitchen and Maniskill-2.