Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CLiFT: Compressive Light-Field Tokens for Compute Efficient and Adaptive Neural Rendering

Authors: Zhengqing Wang, Yuefan Wu, Jiacheng Chen, Fuyang Zhang, Yasutaka Furukawa

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Real Estate10K and DL3DV datasets quantitatively and qualitatively validate our approach, achieving significant data reduction with comparable rendering quality and the highest overall rendering score, while providing trade-offs of data size, rendering quality, and rendering speed. Check demos and code on the project page: https://clift-nvs.github.io. 4 Experiments
Researcher Affiliation Collaboration Zhengqing Wang1 Yuefan Wu1 Jiacheng Chen1 Fuyang Zhang1 Yasutaka Furukawa1,2 1Simon Fraser University 2Wayve
Pseudocode Yes Algorithm 1 Token Selection Algorithm
Open Source Code No Justification: We will release the code and model in the future.
Open Datasets Yes We use two scene datasets Real Estate10K [37] and DL3DV [16], following the recent literature [3, 12, 34]. 2 We preprocess videos and create training/testing video clips in exactly the same as Pixel Splat [2] for Real Estate10K and Depth Splat [34] for DL3DV. The only difference is the number of images to use from each clip for training and testing. Concretely, LVSM [12], MVSplat [3], and Depth Splat [34] used 2 images for training and testing for Real Estate10K. MVSplat and Depth Splat used 2-6 images for training and testing for DL3DV. We use 4-6 images for training and 4-8 images for testing in both datasets to handle larger scenes. 2Real Estate10K is under Creative Commons Attribution 4.0 International License. DL3DV is under DL3DV10K Term of use and Creative Commons Attribution-Non Commercial 4.0 International License.
Dataset Splits Yes We use 4-6 images for training and 4-8 images for testing in both datasets to handle larger scenes.
Hardware Specification Yes We use four NVIDIA RTX A100 GPUs to train our model. Training takes approximately 3 days on Real Estate10K and 5 days on DL3DV.
Software Dependencies No During training, we precompute cluster assignments after the multi-view encoder training and before the condensation training, using faiss.Kmeans which supports GPU acceleration. At test time, we use sklearn.cluster.KMeans for better accuracy.
Experiment Setup Yes For experiments on the 256 256 Real Estate10K, the first and the second stages train for 90,000 steps with a batch-size 64 and 50,000 steps with a batch-size 80, respectively. For DL3DV (256 448), we finetune the Real Estate10K-pretrained model for 100,000 steps with a batch size of 24 in the first stage and 32 in the second stage. Both datasets use the same cosine learning rate scheduler with a 2500-step warmup. The peak learning rate is 4 10 4 for Real Estate10K and 2 10 4 for DL3DV, with the learning rate of the renderer scaled by 0.1 in the second stage.