Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dynamic Shadow Unveils Invisible Semantics for Video Outpainting

Authors: Ruilin Li, Hang Yu, Jiayan Qiu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments In this section, we describe our experimental setup and present quantitative and qualitative comparisons, ablation studies, and a user study to validate the effectiveness of our method. 5.2 Comparisons to Baseline Methods Qualitative Comparisons. As shown in Fig. 3, we show qualitative results comparing our method to baseline approaches. Quantitative Comparisons. We compare our method with several state-of-the-art approaches on the widely used video outpainting benchmarks DAVIS [21] and You Tube-VOS [39], using four common evaluation metrics: PSNR, SSIM [37], LPIPS [48], and FVD [32].
Researcher Affiliation	Academia	Ruilin Li1 Hang Yu1 Jiayan Qiu2 1School of Computer Engineering and Science, Shanghai University, China 2School of Computing and Mathematical Sciences, University of Leicester, United Kingdom EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical equations within Section 3 "Working Scheme of the Proposed Framework," but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks are present.
Open Source Code	No	Answer: [No] Justification: As stated in the response above, we provided detailed instructions on how to replicate our experiment results in the main paper and further in the supplementary material. We will release our code and models upon paper acceptance.
Open Datasets	Yes	5.1 Datasets We evaluate our proposed approach on two common datasets, DAVIS [21] and You Tube-VOS [39], which are widely used benchmarks for video outpainting. Additionally, since these two datasets only provide foreground object segmentation annotations and lack instance-level shadow annotations, we introduce a dataset with shadow annotations to further validate the effectiveness of our approach. SOBA-VID [38] dataset collects 292 videos with a total of 7045 frames, capturing shadow-object interactions across diverse scenarios.
Dataset Splits	Yes	DAVIS [21] dataset collects 150 videos in total (60 for training, 30 for validation, and 60 for testing), with each video annotated with multiple foreground instances per frame. [...] SOBA-VID [38] dataset collects 292 videos with a total of 7045 frames, capturing shadow-object interactions across diverse scenarios. It emphasizes varied shadow patterns, dynamic backgrounds, occlusions, and lighting conditions. The dataset is split into 232 training videos with 5863 sparsely annotated frames and 60 testing videos with 1182 densely annotated frames, resulting in 637 shadowobject pairs in total.
Hardware Specification	Yes	Our method is implemented using Py Torch [20] and trained on eight NVIDIA RTX A6000 GPUs.
Software Dependencies	No	Our method is implemented using Py Torch [20]. Our method is built upon Stable Diffusion v1-5. The temporal modules are initialized with the weights from pretrained motion modules [9]. During the inference process, we use the PNDMScheduler [16] to guide the denoising steps in the reverse diffusion process.
Experiment Setup	Yes	Training Details Our method is implemented using Py Torch [20] and trained on eight NVIDIA RTX A6000 GPUs. During training, we employ the Adam W optimizer [17] with a fixed learning rate of 1 10 4 and the warm-up learning rate step is 1k. Following the training scheme of [8], the model is trained on the Web Vid dataset [1] for 6 epochs with a batch size of 16, utilizing gradient accumulation to stabilize optimization under memory constraints. During the inference process, we use the PNDMScheduler [16] to guide the denoising steps in the reverse diffusion process. Also, we use 50 inference steps and a scaled linear β schedule that starts at 0.00085 and ends at 0.012.