Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild
Authors: Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, Cheng Peng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments 4.1 Datasets 4.2 Evaluation and Implementation 4.3 Ablation Study 4.4 Comparisons We evaluate the performance of MS-GS and current So TA methods on three real-world scenes with sparse inputs one with single appearance and two with varying appearances. ... We conduct an ablation study to validate the effectiveness of our method in Table 1 and Fig. 4. ... Tables 3 and 4 together with Fig. 5 present results on these benchmarks. |
| Researcher Affiliation | Academia | Deming Li Johns Hopkins University Baltimore, MD 21218 EMAIL Kaiwen Jiang University of California, San Diego La Jolla, CA 92093 EMAIL Yutao Tang Johns Hopkins University Baltimore, MD 21218 EMAIL Ravi Ramamoorthi University of California, San Diego La Jolla, CA 92093 EMAIL Rama Chellappa Johns Hopkins University Baltimore, MD 21218 EMAIL Cheng Peng Johns Hopkins University Baltimore, MD 21218 EMAIL |
| Pseudocode | Yes | A.5 Semantic alignment algorithm Algorithm 1: Semantic Masks Prediction Input :Image In, a set of visible 2D Sf M points X on In, segmentation model S, threshold THsfm, threshold THIo U. Output :Final set of masks Mfinal. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We will open-source the code and data upon acceptance. |
| Open Datasets | Yes | 4.1 Datasets We evaluate the performance of MS-GS and current So TA methods on three real-world scenes with sparse inputs one with single appearance and two with varying appearances. Sparse Mip-Ne RF 360 Dataset [33] contains 4 outdoor and 4 indoor scenes with a complex central object or area and a detailed background. ... Sparse Phototourism Dataset [22] consists of scenes of well-known monuments. Specifically, we use "Brandenburg Gate", "Sacre Coeur", and "Trevi Fountain", following previous works [14, 12, 13, 17]. ... To this end, we introduce an unbounded drone dataset that features multi-view appearance. ... We will open-source the code and data upon acceptance. |
| Dataset Splits | Yes | Sparse Mip-Ne RF 360 Dataset [33] ... We sampled 20 images from each scene for training. Sparse Phototourism Dataset [22] ... We sampled 20 images from the official training set and kept the same testing split for evaluation. ... Sparse Unbounded Drone Dataset: ... We evenly sampled 5 images from each appearance, resulting in 20 images for training each scene. We aim to establish these benchmarks for sparse-view synthesis in unconstrained settings. |
| Hardware Specification | Yes | Results are obtained with the NVIDIA RTX A5500 GPU. |
| Software Dependencies | No | We develop MS-GS based on the 3DGS implementation from Ne RFStudio, called Splatfacto [45]. ... We use features extracted from blocks 3 and 4 of VGG-16 [32, 46, 47] for feature loss at different resolutions and receptive fields. |
| Experiment Setup | Yes | A.3 Implementation details We develop MS-GS based on the 3DGS implementation from Ne RFStudio, called Splatfacto [45]. The baseline introduced in our ablation study Section 4.3 uses the same Splatfacto model. In Semantic Depth Alignment, the minimum number of Sf M points threshold within a valid mask is 10. The intersection of two masks for merging is 0.7. We use both back-projected point cloud and MVS points for our initialization. The appearance MLP consists of 3 layers of 64 hidden units. The embedding sizes for the Gaussian feature and per-image appearance embeddings are 16 and 32, respectively. Virtual views are generated by interpolating toward one of the k 4 nearest training cameras. We use features extracted from blocks 3 and 4 of VGG-16 [32, 46, 47] for feature loss at different resolutions and receptive fields. We set λI 0.8, λpix 1.0, and λfeat 0.04. The total number of training iterations is 16,500, with the geometry-guided supervision enabled after 15,000 iterations. The same hyperparameters are maintained throughout the experiments. Results are obtained with the NVIDIA RTX A5500 GPU. |