Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
Authors: Weijie Wang, Donny Y. Chen, Zeyu Zhang, Duochao Shi, Akide Liu, Bohan Zhuang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of ZPressor, we integrate ZPressor into several state-of-the-art feed forward 3DGS models, including pixel Splat [7], MVSplat [8], and Depth Splat [12], and conduct extensive experiments on large-scale benchmarks such as DL3DV-10K [17] and Real Estate-10K [21]. Results show that integrating ZPressor consistently boosts the performance of baseline models under a moderate number of input views (e.g., 12 views), and helps them maintain reasonable accuracy and computational cost even with very dense inputs (e.g., 36 views, as shown in Fig. 1). Our contributions are threefold: [...] Extensive experiments on several large-scale benchmarks show that ZPressor consistently improves the performance of baseline models with a moderate number of input views, and further enhances robustness under dense input settings, where existing models typically degrade significantly. Quantitative comparisons on DL3DV [17]... Quantitative comparisons on RE10K [21]... Efficiency analysis. We report the number of Gaussians (K), inference time (ms) and peak memory (GB) of Depth Splat [12] and Depth Splat with ZPressor. |
| Researcher Affiliation | Academia | Weijie Wang1 Donny Y. Chen2 Zeyu Zhang2 Duochao Shi1 Akide Liu2 Bohan Zhuang1 1ZIP Lab, Zhejiang University 2Monash University Corresponding authors: EMAIL, EMAIL. This work was conducted while D. Y. Chen was affiliated with Monash University. |
| Pseudocode | Yes | Algorithm 1 Overview of Feed-Forward 3DGS framework with ZPressor. Input: K input views V = {Vi}K i=1, camera poses P = {Pi}K i=1, the number of anchor views N, the number of network blocks h. Output: Gaussian parameters Y = {(ยต, ฮฃ, ฮฑ, c)}. [...] Algorithm 2 Farthest Point Sampling for Anchor View Selection Input: Set of view camera positions T = {T1, T2, ..., TK}, Number of anchor views N Output: Indices of the selected anchor views S = {Ta1, Ta2, ..., Tan} |
| Open Source Code | Yes | The video results, code and trained models are available on our project page: https://lhmd.top/zpressor. [...] We will open-source the complete codebase for ZPressor, our ZPressor-integrated versions of Depth Splat, MVSplat, and pixel Splat, and all associated model checkpoints. |
| Open Datasets | Yes | We validate the effectiveness of our ZPressor on the NVS task, following existing works [7, 8, 12], and conduct experiments on two large-scale datasets: DL3DV-10K (DL3DV) [17] and Real Estate-10K (RE10K) [21]. [...] Following MVSplat [8], we conducted experiments using a pretrained model on the Real Estate10K (RE10K) dataset [21] (as detailed in Tab. 2) and tested its performance on the ACID dataset [68] to evaluate the generalization capabilities of our proposed ZPressor across diverse datasets. |
| Dataset Splits | Yes | DL3DV is a challenging large-scale dataset that contains 51.3 million frames from 10,510 real scenes. We used 140 benchmark scenes for testing and the remaining 9896 scenes for training, with filtering applied to ensure that there is strictly no overlap between the training and test sets. RE10K offers a large-scale collection of indoor home tour clips, comprising 10 million frames from around 80,000 video clips sourced from public YouTube videos. It is split into 67,477 training and 7,289 testing scenes. |
| Hardware Specification | Yes | ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state Z. [...] We use the same computing resources to train the baseline and our method. [...] and trained the models for 100,000 steps on A800 GPUs. [...] OOM represent that model cannot infer on an 80G GPU. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' [71] and techniques like 'Flash Attention' [19, 70] and 'Pre Layer Normalization' [69], but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or TensorFlow, nor for the optimizer itself. |
| Experiment Setup | Yes | Implementation details. We use the same computing resources to train the baseline and our method. Due to the memory limit, we use 6 context views for Depth Splat and MVSplat, and 4 context views for pixel Splat. For all of our experiments, we adopted the same learning rate as the baseline, utilized the Adam W optimizer, and trained the models for 100,000 steps on A800 GPUs. Following the setting of the baseline, we use the 256 256 input resolution on RE10K, and 256 448 input resolution on DL3DV. All training losses match those of the baseline, with no additional data or regularization introduced. [...] The model was initially trained on the RE10K [21] for 100,000 steps and subsequently fine-tuned on the DL3DV [17] for an additional 100,000 steps. We employed the Adam W optimizer [71] with a learning rate of 2 10 4. [...] The learning rate was set to 2 10 4 for MVSplat and 1.5 10 4 for pixel Splat, where both of which were trained for 100,000 steps. Notably, due to memory constraints, we trained the pixel Splat model incorporating ZPressor using 4 anchor views, in contrast to the 6 anchor views configured for Depth Splat and MVSplat. |