HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid
Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed approach. and In our extensive experiments, we train Human VLA in Isaac Gym [28] with tasks from HITR. Results demonstrate the effectiveness of our method in generalized object rearrangement and vision-language perception. |
| Researcher Affiliation | Collaboration | Xinyu Xu12 Yizheng Zhang2 Yong-Lu Li1 Lei Han2 Cewu Lu1 1Shanghai Jiao Tong University 2Tencent Robotics X |
| Pseudocode | No | The paper describes its methods using text and block diagrams (e.g., Figure 2 and 3), but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Allen Xuuu/Human VLA. |
| Open Datasets | Yes | Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. ... For the motion dataset used in training, we utilize OMOMO [24] and a locomotion subset from SAMP [14]. |
| Dataset Splits | No | Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. |
| Hardware Specification | Yes | The teacher policy is optimized using Proximal Policy Optimization [40] and takes two days on eight Tesla V100 GPUs to converge. The student policy is trained using DAgger [39] and takes one day on two GPUs. |
| Software Dependencies | No | We conduct experiments in parallel environments simulated using Isaac Gym [28], with neural networks implemented via Py Torch. The paper does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We provide a hyperparameter table for Human VLA-Teacher training in Tab. 5. It is shared for both the carry curriculum pre-training and rearrangement learning. and We provide a hyperparameter table for Human VLA training in Tab. 6. |