Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Authors: Haoxiang You, Yilang Liu, Ian Abraham
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. |
| Researcher Affiliation | Academia | Haoxiang You Department of Mechanical Engineering Yale University New Haven, CT 06520 EMAIL Yilang Liu Department of Mechanical Engineering Yale University New Haven, CT 06520 EMAIL Ian Abraham Department of Mechanical Engineering Department of Computer Science Yale University New Haven, CT 06520 EMAIL |
| Pseudocode | Yes | Algorithm 1: D.VA (Decoupled Visual Based Analytical Policy Gradient) |
| Open Source Code | Yes | Videos and code are available on https://haoxiangyou.github.io/Dva_website/ The code is available at https://github.com/Haoxiang You/D.VA |
| Open Datasets | Yes | We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. We select four classical RL tasks across different complexity levels. For the three benchmark RL methods and the state-to-visual tasks, we employ Mani Skill-V3 [Tao et al., 2024] for rendering. |
| Dataset Splits | No | The paper uses dynamic simulation environments (Cartpole, Hopper, Ant, Humanoid) where data is generated through interaction, not from static pre-split datasets. It describes training for a certain number of environment steps and reports results averaged over five random seeds, which is standard for RL. However, it does not explicitly provide information on fixed training/test/validation dataset splits in terms of percentages, sample counts, or predefined files. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA Ge Force RTX 4080 GPU (16GB) with an Intel Xeon W5-2445 CPU and 256GB RAM. |
| Software Dependencies | No | The paper mentions "Py Torch3D [Ravi et al., 2020]" for differentiable rendering and "Mani Skill-V3 [Tao et al., 2024]" for rendering, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All hyperparameters are listed in Appendix C, while additional details on setup are provided in the Appendix E. Detailed hyperparameter values are provided at the end. For example, Table 8: D.Va training parameters lists "Short horizon length h", "Number of parallel environments N", "Actor learning rate", "Critic learning rate", "Target value network α", "Discount factor γ", "Value estimation λ", "Adam (β1, β2)", "Number of critic training iterations", "Number of critic training minibatches". |