Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, XIAOJUAN QI, Yinda Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the efﬁcacy of our proposed method, we generate stereoscopic videos from monocular videos generated by Sora, Lumiere, WALT, and Zeroscope. Both qualitative and quantitative evaluations suggest that our approach outperforms other baselines in 3D stereoscopic video generation. Our contributions are summarized as follows: We conduct comprehensive experiments that show the superiority of our approach over previous methods for 3D stereoscopic video generation.
Researcher Affiliation	Collaboration	Peng Dai1,2 Feitong Tan1: Qiangeng Xu1: David Futschik1 Ruofei Du1 Sean Fanello1 Xiaojuan Qi2 Yinda Zhang1 1Google 2The University of Hong Kong
Pseudocode	Yes	In the algorithm below, we present the detailed steps to denoise the Frame Matrix with spatial-temporal resampling, where we set µθpzt, c, tq 1 ?1 βt pzt βt ?1 αt ϵθpzt, c, tqq, following DDPM (Ho et al., 2020). Algorithm 1 Frame Matrix Inpainting
Open Source Code	No	Project page at https://daipengwa.github.io/SVG_Project Page/
Open Datasets	No	To validate the effectiveness of our method, we conduct experiments using a variety of recent video generation models, including Sora (Brooks et al., 2024), Lumiere (Bar-Tal et al., 2024), WALT (Gupta et al., 2023), and Zeroscope (Wang et al., 2023a). These models produce diverse left videos from a wide range of input text prompts, covering subjects such as humans, animals, buildings, and imaginary content.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits for the generation models or for the monocular videos used as input. It mentions a user study split: "assigned ﬁve random videos (out of 20 videos) with ﬁve conditions to each participant" but this is for human evaluation, not model training/validation.
Hardware Specification	Yes	Currently, our implementation runs on an A6000 GPU and takes 8 minutes to generate stereoscopic video using only 10GB of RAM.
Software Dependencies	No	The paper mentions methods like DDPM (Ho et al., 2020) and Re Paint (Lugmayr et al., 2022) as denoising schedulers and techniques, and that the user study software was implemented in "Unity 2023.3.0b". However, it does not provide specific version numbers for software dependencies used for the core methodology's implementation, such as programming languages or machine learning libraries.
Experiment Setup	Yes	Implementation Details. To ensure the stereo effect appears realistic, we normalize the up-to-scale depth values predicted by the depth estimation model (Yang et al., 2024) to a range of (1, 10) and set the baseline between left and right views to 0.08. The frame matrix is constructed by evenly placing 8 cameras between the left and right views, with each camera corresponding to a warped video sequence. Due to the limitations of the Zeroscope model, we currently conduct experiments on video sequences with 16 frames. Following the approach of Re Paint (Lugmayr et al., 2022), we employ DDPM (Ho et al., 2020) as our denoising scheduler with 1000 total time steps T and 50 denoising steps, resulting in 20 time step jumps per denoising step. During the initial 25 denoising steps (50 to 25), we resample 8 times at each step to establish a reasonable structure in disoccluded regions. For the remaining steps, we reduce resampling to 4 times and denoise only the right view for improved efﬁciency while generating stereoscopic videos.