Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping

Authors: Ziye Huang, Haoqi Yuan, Yuhui Fu, Zongqing Lu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Res Dex achieves state-of-the-art performance on the Dex Grasp Net dataset comprising 3,200 objects with an 88.8% success rate. It exhibits no generalization gap with unseen objects and demonstrates superior training efficiency, mastering all tasks within only 12 hours on a single GPU. For further details and videos, visit our project page. 5 EXPERIMENTS 5.1 EXPERIMENT SETTINGS We evaluate the effectiveness of our method on Dex Grasp Net (Wang et al., 2023), a large-scale robotic dexterous grasping dataset for thousands of everyday objects. The dataset is split into one training set and two test sets, including one that contains unseen objects in the seen categories and the other that contains unseen objects in unseen categories. The training set includes 3,200 object instances, while the test sets contain a total of 241 object instances. 5.3 ABLATION STUDY Geometry-Agnostic Experts. We compare generalizability between geometry-agnostic policies and policies trained with full state-based observations. We train 3 types of policies on 6 objects, including cell phone, toy figure, bottle, video game console, toilet paper and mug, and we evaluate their performance on the training set, which comprises more than 3000 objects. The results are shown in Figure 2. Our geometry-agnostic policies achieve higher success rates compared to other policies, achieving over 70% success rates when trained on some objects, which demonstrates remarkable generalizability. Policies with the full observations or the full grasping proposal reward demonstrate poor generalization when trained on some specific objects. Table 1: Success rates of state-based policies. We evaluate our method on three different random seeds. The hyper-policy is trained with four geometry-agnostic base policies. We present the success rates after each multi-task training stage.
Researcher Affiliation	Academia	Ziye Huang2, Haoqi Yuan1, Yuhui Fu1, Zongqing Lu1,3 1School of Computer Science, Peking University 2School of EECS, Peking University 3Beijing Academy of Artificial Intelligence Correspondence to Zongqing Lu <EMAIL>.
Pseudocode	Yes	Here, we outline the complete pipeline for training Res Dex, which consists of three phases. The pseudocode is provided in Appendix A.1. A.1 ALGORITHM SUMMARY Algorithm 1 The Training Pipeline of Res Dex
Open Source Code	No	For further details and videos, visit our project page. (No direct link to a code repository or explicit statement about code release for the methodology described.)
Open Datasets	Yes	We evaluate the effectiveness of our method on Dex Grasp Net (Wang et al., 2023), a large-scale robotic dexterous grasping dataset for thousands of everyday objects. B.1 RESULTS ON YCB DATASET To further demonstrate the generalizability of our method, we test the learned policy on the YCB Dataset (Calli et al., 2017), which comproses 75 objects.
Dataset Splits	Yes	The dataset is split into one training set and two test sets, including one that contains unseen objects in the seen categories and the other that contains unseen objects in unseen categories. The training set includes 3,200 object instances, while the test sets contain a total of 241 object instances.
Hardware Specification	Yes	Additionally, Res Dex demonstrates remarkable training efficiency, mastering such a wide range of tasks in only 12 hours on a single NVIDIA RTX 4090 GPU. All the state-based policies are trained on on a single NVIDIA RTX 4090 GPU. Training a base policy takes about 20 minutes, while training a hyper-policy takes about 11 hours. For the vision-based policy, we train on a single A800 GPU, taking about 16 hours.
Software Dependencies	No	We conduct all our experiments in Isaac Gym (Makoviychuk et al., 2021), a GPU-accelerated platform for physics simulation and reinforcement learning. For state-based policies, we use PPO (Schulman et al., 2017) for training. For vision-based policies, we distill the state-based expert policy into a vision-based policy using DAgger (Ross et al., 2011).
Experiment Setup	Yes	Table 8: Hyperparameters of PPO. Name Symbol Value Episode length -200 Num. envs (base policy) -4096 Num. envs (hyper-policy) -11000 Parallel rollout steps per iteration -8 Training epochs per iteration -5 Num. minibatches per epoch -4 Optimizer -Adam Clip gradient norm -1.0 Initial noise std. -0.8 Clip observations -5.0 Clip actions -1.0 Learning rate η 3e-4 Discount factor γ 0.96 GAE lambda λ 0.95 Clip range ϵ 0.2 Table 9: Hyperparameters of DAgger. Name Symbol Value Episode length -200 Num. envs -11000 Parallel rollout steps per iteration -1 Training epochs per iteration -5 Num. minibatches per epoch -4 Optimizer -Adam Clip observations -5.0 Clip actions -1.0 Learning rate η 3e-4 Clip range ϵ 0.2