Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Authors: Kaihua Chen, Tarasha Khurana, Deva Ramanan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Analysis 4.1 Experimental setup 4.2 Comparison to state-of-the-art 4.3 Ablation studies
Researcher Affiliation Academia Kaihua Chen Tarasha Khurana Deva Ramanan Carnegie Mellon University
Pseudocode No The paper describes its methodology in prose and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We include a zip file of the code in supplement with some starter instructions. More in-depth details about running all code will be publicly released on Git Hub after acceptance.
Open Datasets Yes Datasets We train Cog NVS on four in-the-wild video datasets, SA-V [59], TAO [13], Youtube-VOS [86], and DAVIS [53]. We sample 3000, 3000, 4000 and 100 videos respectively from each of the datasets, giving us a total training video pool of 10,000 videos. For pretraining, we randomly select a new subsequence of 49-frames in every epoch and construct its training pairs. For benchmarking, we follow prior work [37, 73] and use a combination of Kubric-4D, Parallel Domain-4D [73] and Dycheck [20].
Dataset Splits Yes We sample 3000, 3000, 4000 and 100 videos respectively from each of the datasets, giving us a total training video pool of 10,000 videos. For pretraining, we randomly select a new subsequence of 49-frames in every epoch and construct its training pairs. For benchmarking, we follow prior work [37, 73] and use a combination of Kubric-4D, Parallel Domain-4D [73] and Dycheck [20]. These have a held-out test set of 20, 20 and 5 videos each.
Hardware Specification Yes To fit within 48GB VRAM, we employ Deep Speed Ze RO-2 [58] to partition model states across 8 A6000 Ada GPUs in a distributed setting. Pretraining completes in approximately 3 days. ... A single novel-view sequence generates in 5 mins on an A6000 Ada.
Software Dependencies No The paper mentions key software components like "Cog Video X" and "Deep Speed Ze RO-2" but does not provide specific version numbers for these or other libraries/frameworks.
Experiment Setup Yes During pretraining, we load the official Cog Video X-5B-I2V checkpoint and fully finetune all 42 transformer blocks. We use the Adam W optimizer with β1 = 0.9, β2 = 0.95, and β3 = 0.98, a learning rate of 2 10e 5, and a batch size of 8 for 12,000 steps. ... During test-time finetuning, we maintain the same optimizer and learning rate but reduce the number of steps to 200 for shorter sequences (e.g., Kubric-4D) and 400 for longer ones (e.g., Dy Check). For all experiments, we use an input resolution of R49 480 720, set the classifier-free guidance scale to 6, and run 50 inference steps.