Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DMWM: Dual-Mind World Model with Long-Term Imagination

Authors: Lingyi Wang, Rashed Shelim, Walid Saad, Naren Ramakrishnan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite and the robotic platforms. Extensive experimental results demonstrate that the proposed framework yields significant improvements in terms of logical coherence, trial efficiency, data efficiency and long-term imagination over the state-of-the-art world models.
Researcher Affiliation	Academia	1 Department of Electrical and Computer Engineering, Virginia Tech, USA 2 Department of Computer Science, Virginia Tech, USA Emails: EMAIL
Pseudocode	Yes	Algorithm 1 DMWM With Actor-Critic Algorithm 2 DMWM With Grad-MPC
Open Source Code	Yes	The code is available at https://github.com/news-vt/DMWM.
Open Datasets	Yes	The training environments consist of 20 continuous control tasks from Deep Mind control (DMC) suite, 4 robotic tasks from Mani Skill2 platform, and 4 robotic tasks from Myo Suite platform.
Dataset Splits	Yes	Logical consistency data for 20 tasks with different horizon sizes is provided in Appendix H.1. The mean and variance of logical consistency are reported over 100 test episodes with the horizon size H = 30.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions hyperparameters and environment settings but no specific GPU/CPU models or other hardware details for the experimental runs.
Software Dependencies	No	The paper does not explicitly mention specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA. It refers to Dreamer V3 and other methods, but not the versions of the underlying software stack used for implementation.
Experiment Setup	Yes	The hyperparameters of models are presented in TABLE 3. Parameter Symbol Value Replay memory size 1e6 Batch size B 50 Sequence length L 64 Seed episode S 5 Training episodes N 1e3 Collect Interval C 100 Max episode length 500 Exploration noise 0.3 Imagination horizon H 30 Gradient clipping 100 RSSM-S1 Activation function Relu Embedding size 1024 Hidden size 200 Belief size 200 State size 30 Overshooting distance 50 Overshooting KL-beta 0 Global KL-beta 0 overshooting reward scale 0 Free nats 3 Bit-depth 5 Weights ϖdyn, ϖrep 1 Optimizer Adam Adam epsilon 1e-4 Learning rate ηψ 1e-3 LINN-S2 Reasoning depth α 30 Logic vector size \|v\|, \|m\| 64 L2 weight βℓ2 1e-5 Regularization weight βreg 1 Logic MLP number 3 Optimizer SGD Learning rate ηw 1e-2 Actor-Critic [5] Return lambda λ 0.95 Planning horizon discount 0.99 Optimizer Adam Adam epsilon 1e-4 Learning rate ηϑ, ηψ 1e-4 Grad-MPC [11] Iterations I 40 Candidate Size J 1000 Learning Rate ηR 0.1-0.01-0.005-0.0001 TD-MPC2 (refer to [13])