Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Authors: Jiaxin Huang, Runnan Chen, Ziwen Li, Zhengqing Gao, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations of various challenging indoor scene benchmarks demonstrate that, even without labeled 3D training data, MLLM-For3D outperforms existing 3D reasoning segmentation methods, effectively interpreting user intent, understanding 3D scenes, and reasoning about spatial relationships. ... 4 Experiments In this section, we present the experimental results for three challenging tasks, focusing on 3D reasoning segmentation, intention grounding. ... 4.3 Ablation Studies & Analysis
Researcher Affiliation	Collaboration	Jiaxin Huang1 Runnan Chen2, Ziwen Li1 Zhengqing Gao1 Xiao He4 Yandong Guo4 Mingming Gong1,3 Tongliang Liu1,2, 1MBZUAI 2The University of Sydney 3The University of Melbourne 4AI2Robotic
Pseudocode	No	The paper describes the methodology in prose and figures (Figure 2), but does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Corresponding authors Code available at: https://github.com/tmllab/2025_NeurIPS_MLLM-For3D
Open Datasets	Yes	We evaluate the performance of 2D reasoning segmentation models, LISA [36] on the ScanNet++ dataset [64]... For the 3D reasoning segmentation task, we adopt Reason3D [23] (derived from ScanNet [12] and Matterport3D [3]) and Instruct3D [19] (derived from ScanNet++ v1 [64]) as benchmarks... Finally, we evaluate grounding on two 3D spatial-reasoning datasets built on ScanNet scenes: 3D-IG (Intent3D) [32] and VG w/o ON [59].
Dataset Splits	Yes	The Reason3D dataset [23] provides query-conditioned object masks on ScanNet V2 [12] and Matterport3D [3]. We follow the official splits and statistics: Matterport3D contributes 934 training and 837 validation samples, while ScanNet V2 contributes 405 training and 308 validation samples. ... The filtered Instruct3D split contains 136 training scenes and 45 validation scenes, yielding 1,034 and 321 query-answer (QA) pairs, respectively.
Hardware Specification	Yes	Training is performed on four NVIDIA A100 GPUs (40 GB each).
Software Dependencies	No	Our 3D segmentation network is implemented using MinkowskiNet14 as the backbone, built on the PyTorch framework. ... We adopt the Minkowski Engine's 3D U-Net backbone ("MinkowskiNet14"). The text mentions software components like PyTorch and Minkowski Engine/MinkowskiNet14 but does not provide specific version numbers for PyTorch or the Minkowski Engine library itself, only the specific network architecture MinkowskiNet14.
Experiment Setup	Yes	For optimization, we employ stochastic gradient descent (SGD) with momentum set to 0.9 and a weight decay of 1×10−4. Data augmentations such as random rotation around the upright axis, random flips on point clouds, and random horizontal flips and resized crops on images were consistently applied to enhance model generalization. ... Table 5: Training configurations across different datasets. Dataset Backbone Batch Size LR Epochs GPUs Voxel Size Max Sweeps ScanNet V2 MinkowskiNet14 8 0.10 40 4 0.05 m 1 Matterport3D MinkowskiNet14 4 0.10 40 4 0.05 m 1 ScanNet++ (Instruct3D) MinkowskiNet14 4 0.01 40 4 0.05 m 1