Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interactive Cross-modal Learning for Text-3D Scene Retrieval

Authors: Yanglin Feng, Yongxiang Li, Yuan Sun, Yang Qin, Dezhong Peng, Peng Hu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on three datasets demonstrate the superiority of IDeal. ... We conduct extensive comparison experiments on text-3D scene datasets. ... In this section, we conduct an ablation study to evaluate the contribution of each proposed component to our IDeal. ... To evaluate the sensitivity of our IDeal to different hyperparameter settings, we plot the retrieval performance versus different values on Scan Refer, as shown in Figure 3.
Researcher Affiliation	Academia	Yanglin Feng1, Yongxiang Li1, Yuan Sun2, Yang Qin1, Dezhong Peng1,3, Peng Hu1 1College of Computer Science, Sichuan University, Chengdu, China. 2National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, Chengdu, China. 3Tianfu Jincheng Laboratory, Chengdu, China.
Pseudocode	No	The paper describes the methodology in narrative text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks in the main body.
Open Source Code	Yes	Code is available at https://github.com/Yangl1n Feng/IDeal.
Open Datasets	Yes	We adopt the Scan Net 3D scene set along with several description sets (i.e., Scan Refer [51], Nr3D [52], Sr3D [52], and Scene Depict-3D2T [6]) to conduct experiments, where Scan Refer, Nr3D, and Sr3D are employed as query sets, and Scene Depict-3D2T is employed to simulate fine-grained memory.
Dataset Splits	No	The paper mentions using specific datasets (Scan Refer, Nr3D, Sr3D, Scene Depict-3D2T) and evaluation metrics (R@1, R@5, R@10, Rsum) but does not provide explicit details about how these datasets were split into training, validation, or test sets within the main text. It states: "More details of datasets, prompts, and additional experiments are provided in the Supplemental Material."
Hardware Specification	Yes	All methods are implemented in Py Torch and carried out on Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions software like "PyTorch", "DGCNN [57]", "BERT [58]", and "Qwen-7B-Instruct [59]" but does not specify their version numbers. The prompt requires specific version numbers for key software components.
Experiment Setup	Yes	1) For our retriever, tuning greater weights to interactive and reconstruction predictions helps achieve a well-balanced trade-off that fully leverages the interactive responses. Additionally, a higher feature fusion weight (e.g., α = 0.75) represents a emphasis on the integration of discriminative features from the refined descriptions, leading to more effective interaction feature fusion. 2) For our proposed questioner, using reasonable and moderate settings of k and β (e.g., k = 20, β = 2.0) enables accurate identification of informative descriptions, thereby supporting reasonable decisions on question types in the next round. 3) During domain adaptation tuning, a relatively wide range of λ and γ values in IAT (i.e., λ [0.2, 0.5] and γ [0.1, 0.8]) ensures effective contrastive adaptation and mitigates the impact of false negatives.