Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, XIAOJUAN QI, Yinda Zhang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the efficacy of our proposed method, we generate stereoscopic videos from monocular videos generated by Sora, Lumiere, WALT, and Zeroscope. Both qualitative and quantitative evaluations suggest that our approach outperforms other baselines in 3D stereoscopic video generation. Our contributions are summarized as follows: We conduct comprehensive experiments that show the superiority of our approach over previous methods for 3D stereoscopic video generation. |
| Researcher Affiliation | Collaboration | Peng Dai1,2 Feitong Tan1: Qiangeng Xu1: David Futschik1 Ruofei Du1 Sean Fanello1 Xiaojuan Qi2 Yinda Zhang1 1Google 2The University of Hong Kong |
| Pseudocode | Yes | In the algorithm below, we present the detailed steps to denoise the Frame Matrix with spatial-temporal resampling, where we set µθpzt, c, tq 1 ?1 βt pzt βt ?1 αt ϵθpzt, c, tqq, following DDPM (Ho et al., 2020). Algorithm 1 Frame Matrix Inpainting |
| Open Source Code | No | Project page at https://daipengwa.github.io/SVG_Project Page/ |
| Open Datasets | No | To validate the effectiveness of our method, we conduct experiments using a variety of recent video generation models, including Sora (Brooks et al., 2024), Lumiere (Bar-Tal et al., 2024), WALT (Gupta et al., 2023), and Zeroscope (Wang et al., 2023a). These models produce diverse left videos from a wide range of input text prompts, covering subjects such as humans, animals, buildings, and imaginary content. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits for the generation models or for the monocular videos used as input. It mentions a user study split: "assigned five random videos (out of 20 videos) with five conditions to each participant" but this is for human evaluation, not model training/validation. |
| Hardware Specification | Yes | Currently, our implementation runs on an A6000 GPU and takes 8 minutes to generate stereoscopic video using only 10GB of RAM. |
| Software Dependencies | No | The paper mentions methods like DDPM (Ho et al., 2020) and Re Paint (Lugmayr et al., 2022) as denoising schedulers and techniques, and that the user study software was implemented in "Unity 2023.3.0b". However, it does not provide specific version numbers for software dependencies used for the core methodology's implementation, such as programming languages or machine learning libraries. |
| Experiment Setup | Yes | Implementation Details. To ensure the stereo effect appears realistic, we normalize the up-to-scale depth values predicted by the depth estimation model (Yang et al., 2024) to a range of (1, 10) and set the baseline between left and right views to 0.08. The frame matrix is constructed by evenly placing 8 cameras between the left and right views, with each camera corresponding to a warped video sequence. Due to the limitations of the Zeroscope model, we currently conduct experiments on video sequences with 16 frames. Following the approach of Re Paint (Lugmayr et al., 2022), we employ DDPM (Ho et al., 2020) as our denoising scheduler with 1000 total time steps T and 50 denoising steps, resulting in 20 time step jumps per denoising step. During the initial 25 denoising steps (50 to 25), we resample 8 times at each step to establish a reasonable structure in disoccluded regions. For the remaining steps, we reduce resampling to 4 times and denoise only the right view for improved efficiency while generating stereoscopic videos. |