Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs
Authors: Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay, Zhensong Zhang, Greg Slabaugh, Eduardo Pérez-Pellitero
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Dy Check, a challenging benchmark with extreme viewpoint variation, show that Vi DAR outperforms all state-of-the-art baselines in visual quality and geometric consistency. We summarise our contributions as follows: ... 3. An extensive experimental evaluation, including both quantitative and qualitative comparisons with prior work, the introduction of a dynamic-region specific benchmark, as well as ablation studies isolating the impact of each component. |
| Researcher Affiliation | Collaboration | 1 Huawei Noah s Ark Lab 2 Queen Mary University of London |
| Pseudocode | No | The paper describes the method through textual explanations and a high-level diagram (Figure 2), but no structured pseudocode or algorithm blocks are present. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We plan to release code upon acceptance and pending internal approval procedures. |
| Open Datasets | Yes | We evaluate the performance of Vi DAR on the Dy Check dataset [5]. ... In addition, we provide an evaluation of Vi DAR on the NVIDIA dataset [60]. |
| Dataset Splits | Yes | We evaluate the performance of Vi DAR on the Dy Check dataset [5]. ... The dataset consists of 14 casually captured scenes, 7 of which have no ground truth test views and are used for qualitative evaluation only and 7 with test views available. Due to the difficulty of obtaining accurate camera poses for all scenes, some methods choose to quantitatively evaluate on only 5 of the available 7 scenes and discard space-out and wheel. ... Following previous works [5; 13], we compute PSNR, SSIM and LPIPS on the co-visibility masked regions of the test views, which we denote with an -m addendum to each metric. We compute metrics at both half-resolution and full-resolution, and following [46], we also report results on a subset of 5 scenes which we label So M-5. ... We provide a complementary new benchmark for the evaluation of monocular to 4D reconstruction methods, where our computed dynamic masks can be used in place of the commonly used co-visibility masks. |
| Hardware Specification | Yes | Our approach does not require a large amount of computational resources, as we use a single graphics card characterised by 60 TFLOPS at fp32. |
| Software Dependencies | No | We utilise a Stable Diffusion [36] model, specifically the pretrained Stable Diffusion XL (SDXL) [33] to improve the quality of rendered images and guide the reconstruction process. ... We train our personalised diffusion model with a Dreambooth [37] approach implemented in the diffusers2 library as a Lo RA fine-tuning process. We use the default implementation of the SDXL model with default parameters. We change the resolution to match our input resolution (720x960). |
| Experiment Setup | Yes | We implement the monocular reconstruction step directly as Mo Sca [13], keeping the original hyperparameters intact. ... We change the resolution to match our input resolution (720x960). Similarly, we change the number of training iterations from the default 500 to 5000 ... we compute the loss as Ldyn = |Edyn m,t ˆIdyn m,t |1 + λp|Edyn m,t ˆIdyn m,t |vgg + λs|Edyn m,t ˆIdyn m,t |ssim, where | |1 is the L1 loss, | |vgg is the perceptual loss using a pretrained VGG network [41], | |ssim is the SSIM [47] loss and λp and λs are hyperparameters set to 0.1. ... We increase the total number of iterations from 8000 to 40000 in order to train on the additional generated data. |