Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
Authors: Ziye Huang, Haoqi Yuan, Yuhui Fu, Zongqing Lu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Res Dex achieves state-of-the-art performance on the Dex Grasp Net dataset comprising 3,200 objects with an 88.8% success rate. It exhibits no generalization gap with unseen objects and demonstrates superior training efficiency, mastering all tasks within only 12 hours on a single GPU. For further details and videos, visit our project page. 5 EXPERIMENTS 5.1 EXPERIMENT SETTINGS We evaluate the effectiveness of our method on Dex Grasp Net (Wang et al., 2023), a large-scale robotic dexterous grasping dataset for thousands of everyday objects. The dataset is split into one training set and two test sets, including one that contains unseen objects in the seen categories and the other that contains unseen objects in unseen categories. The training set includes 3,200 object instances, while the test sets contain a total of 241 object instances. 5.3 ABLATION STUDY Geometry-Agnostic Experts. We compare generalizability between geometry-agnostic policies and policies trained with full state-based observations. We train 3 types of policies on 6 objects, including cell phone, toy figure, bottle, video game console, toilet paper and mug, and we evaluate their performance on the training set, which comprises more than 3000 objects. The results are shown in Figure 2. Our geometry-agnostic policies achieve higher success rates compared to other policies, achieving over 70% success rates when trained on some objects, which demonstrates remarkable generalizability. Policies with the full observations or the full grasping proposal reward demonstrate poor generalization when trained on some specific objects. Table 1: Success rates of state-based policies. We evaluate our method on three different random seeds. The hyper-policy is trained with four geometry-agnostic base policies. We present the success rates after each multi-task training stage. |
| Researcher Affiliation | Academia | Ziye Huang2, Haoqi Yuan1, Yuhui Fu1, Zongqing Lu1,3 1School of Computer Science, Peking University 2School of EECS, Peking University 3Beijing Academy of Artificial Intelligence Correspondence to Zongqing Lu <EMAIL>. |
| Pseudocode | Yes | Here, we outline the complete pipeline for training Res Dex, which consists of three phases. The pseudocode is provided in Appendix A.1. A.1 ALGORITHM SUMMARY Algorithm 1 The Training Pipeline of Res Dex |
| Open Source Code | No | For further details and videos, visit our project page. (No direct link to a code repository or explicit statement about code release for the methodology described.) |
| Open Datasets | Yes | We evaluate the effectiveness of our method on Dex Grasp Net (Wang et al., 2023), a large-scale robotic dexterous grasping dataset for thousands of everyday objects. B.1 RESULTS ON YCB DATASET To further demonstrate the generalizability of our method, we test the learned policy on the YCB Dataset (Calli et al., 2017), which comproses 75 objects. |
| Dataset Splits | Yes | The dataset is split into one training set and two test sets, including one that contains unseen objects in the seen categories and the other that contains unseen objects in unseen categories. The training set includes 3,200 object instances, while the test sets contain a total of 241 object instances. |
| Hardware Specification | Yes | Additionally, Res Dex demonstrates remarkable training efficiency, mastering such a wide range of tasks in only 12 hours on a single NVIDIA RTX 4090 GPU. All the state-based policies are trained on on a single NVIDIA RTX 4090 GPU. Training a base policy takes about 20 minutes, while training a hyper-policy takes about 11 hours. For the vision-based policy, we train on a single A800 GPU, taking about 16 hours. |
| Software Dependencies | No | We conduct all our experiments in Isaac Gym (Makoviychuk et al., 2021), a GPU-accelerated platform for physics simulation and reinforcement learning. For state-based policies, we use PPO (Schulman et al., 2017) for training. For vision-based policies, we distill the state-based expert policy into a vision-based policy using DAgger (Ross et al., 2011). |
| Experiment Setup | Yes | Table 8: Hyperparameters of PPO. Name Symbol Value Episode length -200 Num. envs (base policy) -4096 Num. envs (hyper-policy) -11000 Parallel rollout steps per iteration -8 Training epochs per iteration -5 Num. minibatches per epoch -4 Optimizer -Adam Clip gradient norm -1.0 Initial noise std. -0.8 Clip observations -5.0 Clip actions -1.0 Learning rate η 3e-4 Discount factor γ 0.96 GAE lambda λ 0.95 Clip range ϵ 0.2 Table 9: Hyperparameters of DAgger. Name Symbol Value Episode length -200 Num. envs -11000 Parallel rollout steps per iteration -1 Training epochs per iteration -5 Num. minibatches per epoch -4 Optimizer -Adam Clip observations -5.0 Clip actions -1.0 Learning rate η 3e-4 Clip range ϵ 0.2 |