Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SceneForge: Enhancing 3D-text alignment with Structured Scene Compositions

Authors: Cristian Sbrolli, Matteo Matteucci

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that SCENEFORGE delivers substantial performance gains across multiple tasks, including zero-shot classification on Model Net, Scan Obj NN, Objaverse-LVIS, and Scan Net, as well as few-shot part segmentation on Shape Net Part.
Researcher Affiliation	Academia	Cristian Sbrolli Department of Electronics, Information and Bioengineering Politecnico di Milano Via Ponzio 34/5, 20133 Milan, Italy EMAIL Matteo Matteucci Department of Electronics, Information and Bioengineering Politecnico di Milano Via Ponzio 34/5, 20133 Milan, Italy EMAIL
Pseudocode	Yes	Input: Samples p, Relations s, Target count P Output: Composed 3D sample c3D c3D, pprev A3D(p0) for i = 1 to n do pi A3D(pi) pos P(pi, pprev, si) pi pi + pos + Δ + Ε c3D cat(c3D, pi) pprev pi end c3D A3D(subsample(c3D, P)) return c3D Algorithm 1: 3D Scene Forge algorithm.
Open Source Code	Yes	We release the code for our pipeline, which can be then directly inserted in any 2D-3D-text contrastive model with tiny modifications to the loss and data loader, which are extremely detailed in the paper (Section 3.2 and Section 3.2).
Open Datasets	Yes	We follow the standard zero-shot evaluation protocol on Objaverse LVIS [6], Model Net40 [22] and Scan Obj NN [20], where categories are mapped to text prompts by formatting a set of templates (e.g., a point cloud model of a ) and the model is evaluated on the classification accuracy. Additonally, adopting the pipeline from CLIP2 [27], we test our models on the Scannet [5] dataset to evaluate their zero-shot performance on object instances from real-world scenarios.
Dataset Splits	Yes	We follow the standard zero-shot evaluation protocol on Objaverse LVIS [6], Model Net40 [22] and Scan Obj NN [20]... We adopt the standard classification benchmarks publicly released from Open Shape and publicly available standard splits from Scannet and Scan QA datasets.
Hardware Specification	No	SCENEFORGE only requires an additional GPU hosting the lightweight LLM for composition. We acknowledge ISCRA for awarding this project access to the LEONARDO supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CINECA (Italy).
Software Dependencies	Yes	For caption generation, we use the Qwen 2.5 7B-instruct [24] large language model.
Experiment Setup	Yes	All variants are trained with their public available code and our modified loss for 200 epochs with a global batch size of 1152, α = 0.5, and a target point-cloud resolution of P = 10 k points. This point budget was chosen as a trade-off between detail and efficiency, as detailed in the supplementary.