reproducibilityindex.ai

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed approach. and In our extensive experiments, we train Human VLA in Isaac Gym [28] with tasks from HITR. Results demonstrate the effectiveness of our method in generalized object rearrangement and vision-language perception.
Researcher Affiliation	Collaboration	Xinyu Xu12 Yizheng Zhang2 Yong-Lu Li1 Lei Han2 Cewu Lu1 1Shanghai Jiao Tong University 2Tencent Robotics X
Pseudocode	No	The paper describes its methods using text and block diagrams (e.g., Figure 2 and 3), but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Allen Xuuu/Human VLA.
Open Datasets	Yes	Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. ... For the motion dataset used in training, we utilize OMOMO [24] and a locomotion subset from SAMP [14].
Dataset Splits	No	Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively.
Hardware Specification	Yes	The teacher policy is optimized using Proximal Policy Optimization [40] and takes two days on eight Tesla V100 GPUs to converge. The student policy is trained using DAgger [39] and takes one day on two GPUs.
Software Dependencies	No	We conduct experiments in parallel environments simulated using Isaac Gym [28], with neural networks implemented via Py Torch. The paper does not provide specific version numbers for these software components.
Experiment Setup	Yes	We provide a hyperparameter table for Human VLA-Teacher training in Tab. 5. It is shared for both the carry curriculum pre-training and rearrangement learning. and We provide a hyperparameter table for Human VLA training in Tab. 6.