Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Authors: hanxue liang, Jiawei Ren, Ashkan Mirzaei, Antonio Torralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic, Jiahui Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we first introduce necessary implementation details in 4.1. We evaluate the performance of BTimer extensively on available dynamic scene benchmarks 4.2, and demonstrate its backward compatibility with static scenes 4.3. Ablation studies are found in 4.4.
Researcher Affiliation Collaboration Hanxue Liang1,2 , Jiawei Ren1,3 , Ashkan Mirzaei1,4 , Antonio Torralba1,5, Ziwei Liu3, Igor Gilitschenski4, Sanja Fidler1,4,6, Cengiz Oztireli2, Huan Ling1,4,6, Zan Gojcic1 , Jiahui Huang1 1NVIDIA, 2University of Cambridge, 3Nanyang Technological University, 4University of Toronto, 5MIT, 6Vector Institute
Pseudocode No The paper describes the BTimer model and NTE module through textual descriptions and architectural diagrams (e.g., Figure 2 and 3) rather than structured pseudocode or algorithm blocks.
Open Source Code No Due to institutional constraints, we are not able to release the code until it is fully reviewed by the legal team. We do not have an ETA for this process. We will update the paper with a link to the code as soon as it is available.
Open Datasets Yes We use a mixture of multiple datasets for training [49, 51, 50, 52, 53, 54, 55, 56] along with our 40K annotated dataset on PANDA-70M [57]. Note that we make sure that none of the testing scenes we show below is included in the training datasets.
Dataset Splits Yes We use a mixture of multiple datasets for training [49, 51, 50, 52, 53, 54, 55, 56] along with our 40K annotated dataset on PANDA-70M [57]. Note that we make sure that none of the testing scenes we show below is included in the training datasets. Following the protocol in [64], we take images from the i Phone camera as our context frames and use the frames from the 2 other stationary cameras for evaluation (totaling 3928 images of resolution 360 × 480).
Hardware Specification Yes Training is conducted on 32 NVIDIA A100 GPUs. [...] Measured on a single NVIDIA A100 GPU, BTimer takes 20 ms for 4-view 256^2 reconstruction, 150 ms for the same resolution with 12 views, and 1.55 s for 12-view 512^2 reconstruction. It requires less than 10 GB memory, which easily fits on a commercial-grade GPU (Result shown in Supplement).
Software Dependencies No Our backbone Transformer network is implemented efficiently with Flash Attention-3 [60] and Flex Attention [61]. We use gsplat [62] for robust and scalable 3DGS rasterization since the total number of 3D Gaussians generated by our model can be very large.
Experiment Setup Yes For BTimer, the numbers of training iterations are fixed to 90K, 90K, and 50K for Stage 1 training on 128^2, 256^2, and 512^2 resolutions, and are 10K and 5K for Stage 2 and Stage 3 dynamic scene training respectively. We use the initial learning rates of 4 × 10^-4, 2 × 10^-4 and 1 × 10^-4 for the three stages, and apply a cosine annealing schedule to smoothly decay the learning rate to zero.