Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid
Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed approach. and In our extensive experiments, we train Human VLA in Isaac Gym [28] with tasks from HITR. Results demonstrate the effectiveness of our method in generalized object rearrangement and vision-language perception. |
| Researcher Affiliation | Collaboration | Xinyu Xu12 Yizheng Zhang2 Yong-Lu Li1 Lei Han2 Cewu Lu1 1Shanghai Jiao Tong University 2Tencent Robotics X |
| Pseudocode | No | The paper describes its methods using text and block diagrams (e.g., Figure 2 and 3), but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Allen Xuuu/Human VLA. |
| Open Datasets | Yes | Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. ... For the motion dataset used in training, we utilize OMOMO [24] and a locomotion subset from SAMP [14]. |
| Dataset Splits | No | Our experiments are conducted on the HITR dataset. It is split into train and test subsets at a ratio of 9:1, containing 552 and 63 tasks respectively. |
| Hardware Specification | Yes | The teacher policy is optimized using Proximal Policy Optimization [40] and takes two days on eight Tesla V100 GPUs to converge. The student policy is trained using DAgger [39] and takes one day on two GPUs. |
| Software Dependencies | No | We conduct experiments in parallel environments simulated using Isaac Gym [28], with neural networks implemented via Py Torch. The paper does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We provide a hyperparameter table for Human VLA-Teacher training in Tab. 5. It is shared for both the carry curriculum pre-training and rearrangement learning. and We provide a hyperparameter table for Human VLA training in Tab. 6. |