Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

Authors: Yining Shi, Kun Jiang, Qiang Meng, Ke Wang, JiaBao Wang, Wenchao Sun, Tuopu Wen, mengmeng yang, diange yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the nu Scenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (groundtruth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s).
Researcher Affiliation Collaboration 1School of Vehicle and Mobility, Tsinghua University 2State Key Laboratory of Intelligent Green Vehicle and Mobility 3Kargobot Inc 4Nankai University
Pseudocode No The paper describes the methodology in Section 3 and illustrates it with Figure 2, but it does not contain any explicitly labeled pseudocode blocks or algorithms in a structured format.
Open Source Code Yes Code is available at https://github.com/synsin0/COME.
Open Datasets Yes Most experiments are conducted on the widely used Occ3D-nu Scenes[22] benchmark, which offers 3D occupancy labels for 18 categories based on the large-scale nu Scenes[3] dataset. We also use Occ3D-Waymo[22] benchmark based on the Waymo Open Dataset[21] (WOD), which has 3D occupancy labels for 16 categories.
Dataset Splits Yes The dataset is split into 700 training, 150 validation, and 150 test driving sequences, each lasting 20 seconds. ... The dataset is split into 798 training and 202 validation driving sequences.
Hardware Specification Yes All models are trained on 4 H20 GPUs and use a learning rate of 1e-4 is not stated specifically.
Software Dependencies Yes The proposed algorithm runs in the python3.9 and torch2.5.1 environment and is expected to be compatible with the torch2.x environment. The environment needs to have mmcv 2.x and mmdet3d 1.1.x installed, and it is basically the same as the environment configuration scheme of Occ World[35] and DOME[4].
Experiment Setup Yes (1) Diffusion-based World Model. We adopt the pre-trained Occ-VAE from DOME [4] and train the diffusion-based world model for 2000 epochs with a total batch size of 128 and a learning rate of 2e-4. (2) Scene-centric Forecasting Module. This module is trained for 12 epochs using a total batch size of 32 and the CBGS resampling strategy [36]. (3) COME Control Net. This component is trained for 1000 epochs with a total batch size of 64.