Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Authors: Haoqi Yuan, Bohan Zhou, Yuhui Fu, Zongqing Lu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach demonstrates an 80% success rate in grasping objects from the YCB dataset across four distinct embodiments using a single vision-based policy. Additionally, our policy exhibits zero-shot generalization to two previously unseen embodiments and significant improvement in efficient finetuning. For further details and videos, visit our project page. ... Experimental results demonstrate that our Cross Dex policy outperforms baseline methods on the four training hands and two hands not seen during training.
Researcher Affiliation Academia Haoqi Yuan1, Bohan Zhou1, Yuhui Fu1, Zongqing Lu1,2 1School of Computer Science, Peking University 2Beijing Academy of Artificial Intelligence ... Correspondence to Zongqing Lu <EMAIL>.
Pseudocode Yes A ALGORITHM Algorithm 1: Cross Dex training process. Input: Dexterous hand models H; Object set Ω; Human hand hand pose dataset D; Untrained retargeting networks {P h ξ }h H, state-based policies {πS ψω}ω Ω, and the vision-based policy πV ϕ . Output: The learned vision-based policy πV ϕ .
Open Source Code No The paper mentions a "project page" for further details and videos, but does not provide a direct link to a code repository or explicitly state that the source code for the methodology is being released. For example: "For further details and videos, visit our project page." and "We provide video records of the real-world test on our project page."
Open Datasets Yes We evaluate Cross Dex on 45 daily objects from the YCB dataset (Calli et al., 2015) and 6 dexterous hands, with URDFs provided by Ding et al. (2024). ... To obtain eigengrasps and train the retargeting networks, we use the GRAB dataset (Taheri et al., 2020), which includes 1.6M frames depicting human hands interacting with various objects.
Dataset Splits Yes We use four of these dexterous hands in training and the remaining two a 16-Do F, 4-fingered LEAP Hand and a 12-Do F, 5-fingered Inspire Hand are reserved for testing the model s generalization capabilities. ... For testing the finetuning performance on unseen objects, we use 55 objects from the GRAB dataset (Taheri et al., 2020). For state-based finetuning and certain ablation studies, we select five random objects from the YCB dataset, including mustard bottle, mug, spoon, softball, and cup_j.
Hardware Specification Yes Our experiments within environments of 8,192 can be completed on a single NVIDIA RTX 4090 GPU. For the larger scale experiments involving 16,384 environments and Point Net backbones, we use a single NVIDIA A800 GPU.
Software Dependencies No The paper mentions several software tools and frameworks like Isaac Gym (Makoviychuk et al., 2021), PPO (Schulman et al., 2017), DAgger (Ross et al., 2011), and Point Net (Qi et al., 2017), but it does not specify explicit version numbers for these or any other ancillary software components. For example: "We establish cross-embodiment simulation environments using Isaac Gym (Makoviychuk et al., 2021) to train policies."
Experiment Setup Yes Table 5: Hyperparameters of PPO. Name Symbol Value Parallel rollout steps per iteration -8 Training epochs per iteration -5 Minibatch size -16384 Optimizer -Adam Learning rate η 3e-4 Discount factor γ 0.96 GAE lambda λ 0.95 Clip range ϵ 0.2 ... Table 6: Hyperparameters of DAgger. Name Symbol Value Parallel rollout steps per iteration -8 Training epochs per iteration -5 Minibatch size -4096 Optimizer -Adam Learning rate η 3e-4