Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ODG: Occupancy Prediction Using Dual Gaussians
Authors: Yunxiao Shi, Yinhao Zhu, Herbert Cai, Shizhong Han, Jisoo Jeong, Amin Ansari, Fatih Porikli
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Occ3D-nu Scenes and Occ3D-Waymo benchmarks demonstrate our proposed method sets new state-of-the-art results while maintaining low inference cost. 4 Experiments 4.1 Experiment Setup Datasets: We evaluate our model on the Occ3D benchmark [50] which bootstraps the nu Scenes [6] and Waymo-Open [45] dataset. Evaluation Metrics: We evaluate our model under the m Io U and Ray Io U [47] metric: Implementation Details: We implement our proposed method in Py Torch [42]. 4.2 Evaluation Results In this section, we report evaluation results on the Occ3D benchmark [50] and compare with latest state-of-the-art methods. 4.4 Ablation Studies In this section, we conduct multiple ablation studies to analyze the effects of various components in our proposed ODG. |
| Researcher Affiliation | Industry | Yunxiao Shi Yinhao Zhu Shizhong Han Jisoo Jeong Amin Ansari Hong Cai Fatih Porikli Qualcomm AI Research Qualcomm Technologies, Inc EMAIL Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. |
| Pseudocode | Yes | Algorithm 1: Coarse-to-fine refinement in ODG. |
| Open Source Code | No | We will aim to release the code, which will be subject to our institution s review and approval. The datasets used in our experiments are publicly available. |
| Open Datasets | Yes | Datasets: We evaluate our model on the Occ3D benchmark [50] which bootstraps the nu Scenes [6] and Waymo-Open [45] dataset.1 nu Scenes consists of 1,000 scenes with a split of 700/150/150 for training, validation and testing. Occ3D-nu Scenes annotates 3D occupancy ground-truth providing 17 semantic classes. Waymo Open [45] has 798 training scenes and 202 validation scenes. |
| Dataset Splits | Yes | nu Scenes consists of 1,000 scenes with a split of 700/150/150 for training, validation and testing. On Waymo, we sample 20% of the data matching practices in previous works [53, 50]. |
| Hardware Specification | Yes | Unless otherwise specified, we train all our models with a global batch size of 8 for 100 epochs using NVIDIA A100 GPUs. During inference, we adopt the standard practice and make use of the camera visibility masks provided by the dataset [50] and only evaluate in unoccluded regions. Inference runtime is measured on a single idle A100 GPU with Py Torch fp32 backend. |
| Software Dependencies | No | We implement our proposed method in Py Torch [42]. Following previous works [35, 53, 4], we use Res Net-50 [20] as image backbone to extract multi-camera image features. We use Adam W [38] as the optimizer with weight decay of 0.01. We train all our models with an initial learning rate of 2 10 4 and decays with Cosine Annealing [39] schedule. We profiled ODG-L at inference time with Deep Speed [43]. |
| Experiment Setup | Yes | Implementation Details: We implement our proposed method in Py Torch [42]. Following previous works [35, 53, 4], we use Res Net-50 [20] as image backbone to extract multi-camera image features. On nu Scenes, we resize input images to the resolution of 256 704. On Waymo, all input images are resized and padded to 640 960. For Ours-tiny, we set number of static Gaussian queries S = 500 and number of dynamic Gaussian queries D = 100. For Ours-large, we set S = 4000 and D = 800, respectively. We use L = 6 transformer layers to conduct coarse-to-fine prediction. We set λ3d = 0.2 to balance box loss Lbox and occupancy loss Locc. For rendering loss Lr, we set λ = 0.05 for stage ℓ= 1, 6, and λ = 0.01 for the rest. We use Adam W [38] as the optimizer with weight decay of 0.01. We train all our models with an initial learning rate of 2 10 4 and decays with Cosine Annealing [39] schedule. For experiments on Waymo, we sample 20% of the data matching practices in previous works [53, 50]. Unless otherwise specified, we train all our models with a global batch size of 8 for 100 epochs using NVIDIA A100 GPUs. |