Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
Authors: Wentao Xu, Huiqiao Fu, Haoyu Dong, Zhehao Zhou, Chunlin Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we evaluate DEAL on five sim-to-sim tasks (Allegro Hand, Humanoid, Go2, Cartpole, Ant) and two sim-to-real tasks (Cartpole, Go2). First, we evaluate DEAL s parameter identification capabilities, particularly in high-dimensional settings. Using a policy trained with Uniform Domain Randomization (UDR), we collect demonstrations in target environment and conduct parameter searches with DEAL to redefine the simulator. The policy is then retrained in the enhanced simulator and its transfer performance is tested in the target domain. We further assess DEAL s adaptability by expanding the search scale and analyzing its dependence on target-domain demonstrations. Finally, we complete the challenging sim-to-real transfer task. Experimental results show that DEAL achieves state-of-the-art stability and identification accuracy in high-dimensional parameter identification tasks, effectively bridging the sim-to-real gap with limited real-world data. |
| Researcher Affiliation | Academia | Department of Control Science and Intelligent Engineering, School of Management and Engineering, Nanjing University, China. EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The schematic overview of the DEAL architecture is shown in Fig. 1, and the pseudo-code is shown in Algorithm 1. |
| Open Source Code | No | We are in the process of organizing the experiment code and we will provide open access to well-documented code. |
| Open Datasets | Yes | Our experiments employed Isaac Gym [46] as the simulator. In Isaac Gym, we can parallelly collect trajectories in hundreds of environments with different parameters which means we could evaluate hundreds of parameters parallelly. During parameter search, we instantiat 200 parallel environments to assess reality-to-simulation alignment across varying physical parameters. For Cartpole, Ant, Humanoid and Allegro Hand, we implement Soft Actor-Critic (SAC) [47] to train a neural network as RL controller. For Go2, we adopt RMA [33] to develop controllers tracking velocity commands and climbing platforms, and train canter controllers using Ess-Info GAIL [48] for bio-inspired running and command following. |
| Dataset Splits | No | The paper discusses collecting "demonstrations" and "trajectories" from the target environment but does not explicitly detail any training, validation, or test splits for these collected data points for the purpose of evaluating the DEAL model or its components. It refers to "limited real-world demonstration collected by π0" but not how these are split for experiment validation. |
| Hardware Specification | Yes | In our experiment, running on a PC equipped with Intel i5-14600KF and RTX 4060 Ti, DEAL can complete the entire search process within a few minutes, the search computation cost can be found in the Appendix A.2. |
| Software Dependencies | No | Our experiments employed Isaac Gym [46] as the simulator. In Isaac Gym, we can parallelly collect trajectories in hundreds of environments with different parameters which means we could evaluate hundreds of parameters parallelly. For Cartpole, Ant, Humanoid and Allegro Hand, we implement Soft Actor-Critic (SAC) [47] to train a neural network as RL controller. For Go2, we adopt RMA [33] to develop controllers tracking velocity commands and climbing platforms, and train canter controllers using Ess-Info GAIL [48] for bio-inspired running and command following. The discriminator used in DEAL adopts a fully connected MLP with input (s,a,s ), two hidden layers of 256 units with ReLU activation, and a scalar output. No specific version numbers for these software/libraries are provided. |
| Experiment Setup | Yes | During parameter search, we instantiat 200 parallel environments to assess reality-to-simulation alignment across varying physical parameters. For Cartpole, Ant, Humanoid and Allegro Hand, we implement Soft Actor-Critic (SAC) [47] to train a neural network as RL controller. For Go2, we adopt RMA [33] to develop controllers tracking velocity commands and climbing platforms, and train canter controllers using Ess-Info GAIL [48] for bio-inspired running and command following. The initial search range U for parameters of each task is the same as its training range, we performed 50 steps of parameter search for each task with limited real-world demonstration collected by π0. The UDR policy π0 of Ant and Go2 was trained within the range of [1/3 θt,3 θt], the remain tasks were trained within the range of [1/5 θt,5 θt], where θt denoting the parameters of the target domain. The discriminator used in DEAL adopts a fully connected MLP with input (s, a, s ), two hidden layers of 256 units with ReLU activation, and a scalar output. To further stabilize the training process, we use Weight Clipping [45] to located the weights of discriminator in a compact interval to satisfy the Lipschitz continuity condition, ensuring the effective computation of the Wasserstein distance. |