Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Imagine360: Immersive 360 Video Generation from Perspective Anchor
Authors: Jing Tan, Shuai Yang, Tong Wu, Jingwen He, Yuwei Guo, Ziwei Liu, Dahua Lin
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence with our curated dataset among state-of-the-art 360 video generation methods with both real and generated videos. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2Shanghai Jiao Tong University 3Stanford University 4S-Lab, Nanyang Technological University 5Shanghai Artificial Intelligence Laboratory |
| Pseudocode | No | The paper includes diagrams and flowcharts (e.g., Figure 2: Pipeline of Imagine360), but no formal pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | We provide You Tube ID, time intervals, and text captions in a csv file in the supplementary material. We also provide the inference code of our proposed model in the supplementary. |
| Open Datasets | Yes | In this regard, we introduce You Tube360, a ready-to-train 360 video dataset comprising 10K curated clips from You Tube. Our dataset incorporates manual quality control and sophisticated data cleaning to select high-quality training segments with diverse and structured motion. |
| Dataset Splits | No | The paper describes the creation of the You Tube360 dataset and states its size (9,558 five-second clips). It also mentions collecting a 'test benchmark by randomly sampling videos from 360-1M [28], Real Estate10K [45], and Cog Video X [40] generations'. However, it does not provide specific percentages, sample counts, or explicit splitting methodologies for training, validation, and test sets to ensure reproducibility from its own dataset or for the overall experimental setup. |
| Hardware Specification | Yes | Training is conducted on 8 NVIDIA A100 GPUs in 50k training steps... For inference, generating a video of 512 1024 resolution and 32 frames takes approximately 6 minutes on average, using a single NVIDIA A100 GPU with 39 GB VRAM. |
| Software Dependencies | No | The spatial and motion modules are respectively initialized on Stable Diffusion v2.1 and [2]. While Stable Diffusion v2.1 is a specific model version, the paper does not specify other key software dependencies such as programming language versions (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow) and their versions, or CUDA versions. |
| Experiment Setup | Yes | Training is conducted on 8 NVIDIA A100 GPUs in 50k training steps, with the spatial Lo RA rank and αLo RA set to 32 and 1.0. The training resolution H W is set to 256 512, the length of frames to 40, the batch size to 1, and the learning rate to 1 10 5. |