Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations

Authors: Gaia Di Lorenzo, Federico Tombari, Marc Pollefeys, Daniel Barath

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on two challenging real-world datasets demonstrate that Object-X achieves high-fidelity novel-view synthesis comparable to standard 3D Gaussian Splat reconstruction, while significantly improving geometric accuracy. Moreover, Object-X achieves competitive performance with specialized methods in scene alignment and localization.
Researcher Affiliation Collaboration 1ETH Zurich 2Google 3Microsoft EMAIL
Pseudocode No The paper describes the methodology in prose and through figures, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/gaiadilorenzo/object-x.
Open Datasets Yes Evaluations on two challenging real-world datasets demonstrate that Object-X achieves high-fidelity novel-view synthesis comparable to standard 3D Gaussian Splat reconstruction, while significantly improving geometric accuracy. Moreover, Object-X achieves competitive performance with specialized methods in scene alignment and localization. ... Datasets. The 3RScan dataset (9) consists of 1,335 annotated indoor scenes ... Scan Net. To evaluate generalization, we test on Scan Net (4) without training our model on it.
Dataset Splits Yes The 3RScan dataset (9) consists of 1,335 annotated indoor scenes covering 432 distinct spaces, with 1,178 scenes (385 rooms) used for training and 157 scenes (47 rooms) reserved for validation and testing. ... we reorganized the original validation split, allocating 34 scenes (17 rooms) for validation and 123 scenes (30 rooms) for testing. ... We use 77 test scenes from the split defined in (15).
Hardware Specification Yes Implementation details. All experiments are conducted on a machine with an A100 GPU with 80GB of RAM.
Software Dependencies No The paper mentions software components like Open3D (40), DINOv2 (19), and Metric3D (6) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Implementation details. All experiments are conducted on a machine with an A100 GPU with 80GB of RAM. During sparsification, a threshold of 0.5 is applied to the predicted occupancy. The mesh is constructed using a voxel size of 0.015 and an SDF truncation value of 0.04. ... optimization for 7,000 iterations using their default hyperparameter settings. ... For the sparse transformer-based encoder and decoder, we apply gradient clipping at a threshold of 0.01. This is crucial for stabilizing the training process and preventing excessively large updates within the structured latent space. Optimization is conducted using the Adam W optimizer with a learning rate of 1 10 4. ... A higher learning rate of 1 10 3 is utilized in this phase. ... a lower learning rate of 1 10 4 is adopted.