Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DMWM: Dual-Mind World Model with Long-Term Imagination
Authors: Lingyi Wang, Rashed Shelim, Walid Saad, Naren Ramakrishnan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite and the robotic platforms. Extensive experimental results demonstrate that the proposed framework yields significant improvements in terms of logical coherence, trial efficiency, data efficiency and long-term imagination over the state-of-the-art world models. |
| Researcher Affiliation | Academia | 1 Department of Electrical and Computer Engineering, Virginia Tech, USA 2 Department of Computer Science, Virginia Tech, USA Emails: EMAIL |
| Pseudocode | Yes | Algorithm 1 DMWM With Actor-Critic Algorithm 2 DMWM With Grad-MPC |
| Open Source Code | Yes | The code is available at https://github.com/news-vt/DMWM. |
| Open Datasets | Yes | The training environments consist of 20 continuous control tasks from Deep Mind control (DMC) suite, 4 robotic tasks from Mani Skill2 platform, and 4 robotic tasks from Myo Suite platform. |
| Dataset Splits | Yes | Logical consistency data for 20 tasks with different horizon sizes is provided in Appendix H.1. The mean and variance of logical consistency are reported over 100 test episodes with the horizon size H = 30. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. It mentions hyperparameters and environment settings but no specific GPU/CPU models or other hardware details for the experimental runs. |
| Software Dependencies | No | The paper does not explicitly mention specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA. It refers to Dreamer V3 and other methods, but not the versions of the underlying software stack used for implementation. |
| Experiment Setup | Yes | The hyperparameters of models are presented in TABLE 3. Parameter Symbol Value Replay memory size 1e6 Batch size B 50 Sequence length L 64 Seed episode S 5 Training episodes N 1e3 Collect Interval C 100 Max episode length 500 Exploration noise 0.3 Imagination horizon H 30 Gradient clipping 100 RSSM-S1 Activation function Relu Embedding size 1024 Hidden size 200 Belief size 200 State size 30 Overshooting distance 50 Overshooting KL-beta 0 Global KL-beta 0 overshooting reward scale 0 Free nats 3 Bit-depth 5 Weights ϖdyn, ϖrep 1 Optimizer Adam Adam epsilon 1e-4 Learning rate ηψ 1e-3 LINN-S2 Reasoning depth α 30 Logic vector size |v|, |m| 64 L2 weight βℓ2 1e-5 Regularization weight βreg 1 Logic MLP number 3 Optimizer SGD Learning rate ηw 1e-2 Actor-Critic [5] Return lambda λ 0.95 Planning horizon discount 0.99 Optimizer Adam Adam epsilon 1e-4 Learning rate ηϑ, ηψ 1e-4 Grad-MPC [11] Iterations I 40 Candidate Size J 1000 Learning Rate ηR 0.1-0.01-0.005-0.0001 TD-MPC2 (refer to [13]) |