Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
Authors: Kornel Howil, Joanna Waczynska, Piotr Borycki, Tadeusz Dziarmaga, Marcin Mazur, Przemysław Spurek
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments section is divided into four parts, each corresponding to a specific modality for which the model has been applied. We note that because CLIPGaussian operates across multiple data modalities, baselines differ by task. Within each task, we standardize the setup for fair comparisons. For comparison, we include two versions of our method which differ in hyperparameters. CLIPGaussian with standard parameters and CLIPGaussian-Light which produces lighter stylization. Details of hyperparameters can be found in Appendix A. Additional experiments using other datasets and ablation studies are presented in Appendix B. For all 2D, 3D, Videos, 4D-objects experiments, we used the NVIDIA RTX 4090 GPU, For 4D-scenes we used NVIDIA DGX A100 GPU. |
| Researcher Affiliation | Academia | 1Jagiellonian University, Faculty of Mathematics and Computer Science; 2Jagiellonian University, Doctoral School of Exact and Natural Sciences; 3IDEAS Research Institute. Correspondence to: Kornel Howil <EMAIL>, Przemysław Spurek <EMAIL> |
| Pseudocode | No | The paper describes the method and loss function using mathematical formulas (Eq. 1-5) and architectural diagrams (Fig. 2), but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available on Git Hub 2. 2https://github.com/kornelhowil/CLIPGaussian |
| Open Datasets | Yes | We evaluate performance on both image- and text-based stylization... Tab. 1 compares our method with Instruct-GS2GS [13] and DGE [14] for text-conditioned style transfer and Style Gaussian [16], SGSST [50], ABC-GS [51] and G-Style [21] for image-conditioned style transfer. Experiments use two objects (lego, hotdog) and two scenes (garden, bonsai), each evaluated under both image- and text-driven conditions. ... The Ne RF-Synthetic dataset [55]... Mip-Ne RF 360 [56]... The neural 3D video dataset coffee_martini (Dy Ne RF) [54]... The D-Ne RF dataset [60]... MS-COCO [62]... DAVIS dataset [63]... Ultra Video Group (UVG) dataset [64]. |
| Dataset Splits | Yes | For DAVIS dataset we used videos with resolution 854px × 480px and for UVG we downscaled videos to 960px × 540px. ... The neural 3D video dataset coffee_martini (Dy Ne RF) [54] provides time-synchronized and calibrated multiview video sequences capturing complex 4D dynamic scenes. The D-Ne RF dataset [60] consists of seven moving objects, with the constraint that only one camera view is accessible at any given time step. ... In our experiments we use the first 24 frames following the data loader provided in [5] to show the capabilities of CLIPGaussian in real 4D scenes. |
| Hardware Specification | Yes | For all 2D, 3D, Videos, 4D-objects experiments, we used the NVIDIA RTX 4090 GPU, For 4D-scenes we used NVIDIA DGX A100 GPU. |
| Software Dependencies | No | The paper mentions using pre-trained 'VGG-19' and 'CLIP' models, specifically 'Vi T-B/32 CLIP model [53]' and 'Vi T-L/14 CLIP model [53]'. However, it does not specify version numbers for general software dependencies like Python, PyTorch, or other libraries used for implementation. |
| Experiment Setup | Yes | Our model is trained for 5000 steps without densification or pruning using loss function described in the section 3. If not stated otherwise we set λb = 1000, λp = 90, λd = 5, λc = 0.8, patch_size = 128, num_patch = 64 and feature_lr = 0.01. For the CLIPGaussian-Light variant we set feature_lr = 0.002 and λp = 35. |