Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Authors: Yang Zhang, Xinran Li, Jianing Ye, Shuang Qiu, Delin Qu, Xiu Li, Chongjie Zhang, Chenjia Bai

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate DIMA on challenging continuous MARL benchmarks, including MAMu Jo Co [22] and Bi-Dex Hands [23], in low-data regimes. Experimental results show that DIMA consistently improves the prediction accuracy of environment dynamics and outperforms both model-free and strong model-based MARL baselines in terms of final return and sample efficiency.
Researcher Affiliation	Collaboration	1Tsinghua University, 2The Hong Kong University of Science and Technology, 3Washington University in St. Louis, 4City University of Hong Kong, 5Fudan University, 6Institute of Artificial Intelligence (Tele AI), China Telecom, 7Shenzhen Research Institute of Northwestern Polytechnical University
Pseudocode	Yes	We summarize the overall training procedure of DIMA paired with learning in imaginations in Algorithm 1 below. We denote as D the replay databuffer which stores data collected from the real environment.
Open Source Code	Yes	Codes are open-sourced at https://github.com/breez3young/DIMA.
Open Datasets	Yes	We evaluate our method on two widely-used multi-agent continuous control benchmarks requiring heterogeneous-agent cooperation: Multi-Agent Mu Jo Co (MAMu Jo Co) [22] and Bimanual Dexterous Hands (Bi-Dex Hands) [23].
Dataset Splits	No	To highlight the sample efficiency of learning in imaginations, we adopt a low-data regime [66], limiting real-environment samples to 1M for MAMu Jo Co and 300k for Bi-Dex Hands, adjusted for their different episode lengths.
Hardware Specification	Yes	All our experiments including the evaluation of chosen baselines are run on a machine with a single NVIDIA RTX 4090 GPU, a 24-core CPU, and 256GB RAM.
Software Dependencies	No	Our implementation is based on the open-source repository: https://github.com/lucidrains/ vector-quantize-pytorch.
Experiment Setup	Yes	Table 4: Behaviour learning hyperparameters. Table 9: Architecture details. Table 10: Hyperparameters for DIMA.