HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed approach. and In our extensive experiments, we train Human VLA in Isaac Gym [28] with tasks from HITR. Results demonstrate the effectiveness of our method in generalized object rearrangement and vision-language perception.
Researcher Affiliation Collaboration Xinyu Xu12 Yizheng Zhang2 Yong-Lu Li1 Lei Han2 Cewu Lu1 1Shanghai Jiao Tong University 2Tencent Robotics X
Pseudocode No The paper describes its methods using text and block diagrams (e.g., Figure 2 and 3), but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Allen Xuuu/Human VLA.
Open Datasets Yes Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. ... For the motion dataset used in training, we utilize OMOMO [24] and a locomotion subset from SAMP [14].
Dataset Splits No Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively.
Hardware Specification Yes The teacher policy is optimized using Proximal Policy Optimization [40] and takes two days on eight Tesla V100 GPUs to converge. The student policy is trained using DAgger [39] and takes one day on two GPUs.
Software Dependencies No We conduct experiments in parallel environments simulated using Isaac Gym [28], with neural networks implemented via Py Torch. The paper does not provide specific version numbers for these software components.
Experiment Setup Yes We provide a hyperparameter table for Human VLA-Teacher training in Tab. 5. It is shared for both the carry curriculum pre-training and rearrangement learning. and We provide a hyperparameter table for Human VLA training in Tab. 6.