Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Authors: Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, Hao Dong
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/Dex Garment Lab/. ... Extensive experiments and detailed analysis of our approach and baseline in both simulation and real-world settings, demonstrating its data efficiency and generalization ability significantly outperforming baseline methods. ... We evaluate our method on 14 garment manipulation tasks with varying deformation characteristics. ... Each task is evaluated over 50 episodes with three different seeds. We report success rates as Mean Std across all trials. |
| Researcher Affiliation | Academia | Yuran Wang1* Ruihai Wu1* Yue Chen1* Jiarui Wang1 Jiaqi Liang1 Ziyu Zhu1 Haoran Geng2 Jitendra Malik2 Pieter Abbeel2 Hao Dong1 1Peking University 2University of California, Berkeley |
| Pseudocode | No | The paper describes methods and algorithms like Garment Affordance Model (GAM) and Structure-Aware Diffusion Policy (SADP) in detail, including their components and training processes. However, it does not present any formal pseudocode blocks or algorithms labeled as such in figures or sections. The steps are described in paragraph form and through diagrams like Figure 5 and Figure 6. |
| Open Source Code | Yes | Our project page is available at: https://wayrise.github.io/Dex Garment Lab/. ... Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Please refer to our code repo https://wayrise.github.io/Dex Garment Lab/. |
| Open Datasets | Yes | We use garment models from the Clothes Net dataset [45], which contains over 2,500 garments across 8 categories (e.g., tops, coats, trousers, dresses, etc.), and build environment-interaction assets (such as hangers, pothooks, humans, etc.). |
| Dataset Splits | Yes | During data collection and policy evaluation, we randomly selected the position of the garment within a certain range. For this task, the initial position of the tops is randomized within a rectangular area defined by -0.10<x<0.10 and 0.70<y<0.90. ... During data collection, we first selected 100 garments and then randomly chose from these 100 garments for data collection. During policy randomization, we randomly select from all 247 garments to ensure that the validation set contains data that was not seen during training. ... For each task, we have 3 distinct garments per category, each with 5 initial deformations. ... During dataset loading, the test set is configured to comprise 2% of the entire training dataset. |
| Hardware Specification | Yes | Each experiment is conducted on an RTX 4090 GPU, and consumes about 22 GB GPU Memory for training. ... All experiments are conducted on an NVIDIA A800 GPU, with approximately 75 GB of GPU memory consumption when training with a batch size of 200. ... Our setup comprises two Real Man RM75-6F (Arms) with Psibot G0-R (Dexterous Hands) and a Real Sense D435 camera (As shown in Appendix E). ... To this end, as shown in Fig. 10, we aligned the settings of both the simulation and real-world environments by using the same hardware setup Shadow Hand and UR10e and selected two tasks, Hang Trousers and Wear Hat, for policy-level sim-to-real transfer, which means way 2. ... To address the limited precision of the Realsense D435 in this context, we employed a Kinect camera for more accurate point cloud acquisition. ... Two UR10e paired with Shadow Hands and an Azure Kinect camera are used in our sim-to-real experiment. |
| Software Dependencies | Yes | In this section, we present the construction of Dex Garment Lab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation and built upon Isaac Sim 4.5.0. ... We use Py Torch as our Deep Learning framework. ... We use Segment-Anything-2 [30] to segment the garment and interaction object from the scene and obtain corresponding point clouds. |
| Experiment Setup | Yes | Hyper-Parameters Selection. The training of GAM follows the hyperparameter settings used in the Uni Garment Manip framework. we set the number of skeleton pairs to be 50 and batch size to be 32. In each batch, we sample 32 garment pairs. For each garment pair, we sample 20 positive and 150 negative point pairs for each positive point pair. Therefore, in each batch, 32 32 20 data will be used to update the model. During the Correspondence training stage, we train the model for 40,000 batches. ... Hyper-Parameters Selection. We set the horizon, observation_steps, and action_steps to 8, 3, and 4, respectively. The number of denoising steps in the diffusion process is set to 10. The model is trained for a total of 3000 epochs, with validation performed every 25 epochs and checkpoints saved every 100 epochs. We use the Adam W optimizer with an initial learning rate of 1 10 4, and adopt a cosine learning rate scheduler with 500 warm-up steps. During dataset loading, the test set is configured to comprise 2% of the entire training dataset. |