Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
Authors: Kaihua Chen, Tarasha Khurana, Deva Ramanan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Empirical Analysis 4.1 Experimental setup 4.2 Comparison to state-of-the-art 4.3 Ablation studies |
| Researcher Affiliation | Academia | Kaihua Chen Tarasha Khurana Deva Ramanan Carnegie Mellon University |
| Pseudocode | No | The paper describes its methodology in prose and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include a zip file of the code in supplement with some starter instructions. More in-depth details about running all code will be publicly released on Git Hub after acceptance. |
| Open Datasets | Yes | Datasets We train Cog NVS on four in-the-wild video datasets, SA-V [59], TAO [13], Youtube-VOS [86], and DAVIS [53]. We sample 3000, 3000, 4000 and 100 videos respectively from each of the datasets, giving us a total training video pool of 10,000 videos. For pretraining, we randomly select a new subsequence of 49-frames in every epoch and construct its training pairs. For benchmarking, we follow prior work [37, 73] and use a combination of Kubric-4D, Parallel Domain-4D [73] and Dycheck [20]. |
| Dataset Splits | Yes | We sample 3000, 3000, 4000 and 100 videos respectively from each of the datasets, giving us a total training video pool of 10,000 videos. For pretraining, we randomly select a new subsequence of 49-frames in every epoch and construct its training pairs. For benchmarking, we follow prior work [37, 73] and use a combination of Kubric-4D, Parallel Domain-4D [73] and Dycheck [20]. These have a held-out test set of 20, 20 and 5 videos each. |
| Hardware Specification | Yes | To fit within 48GB VRAM, we employ Deep Speed Ze RO-2 [58] to partition model states across 8 A6000 Ada GPUs in a distributed setting. Pretraining completes in approximately 3 days. ... A single novel-view sequence generates in 5 mins on an A6000 Ada. |
| Software Dependencies | No | The paper mentions key software components like "Cog Video X" and "Deep Speed Ze RO-2" but does not provide specific version numbers for these or other libraries/frameworks. |
| Experiment Setup | Yes | During pretraining, we load the official Cog Video X-5B-I2V checkpoint and fully finetune all 42 transformer blocks. We use the Adam W optimizer with β1 = 0.9, β2 = 0.95, and β3 = 0.98, a learning rate of 2 10e 5, and a batch size of 8 for 12,000 steps. ... During test-time finetuning, we maintain the same optimizer and learning rate but reduce the number of steps to 200 for shorter sequences (e.g., Kubric-4D) and 400 for longer ones (e.g., Dy Check). For all experiments, we use an input resolution of R49 480 720, set the classifier-free guidance scale to 6, and run 50 inference steps. |