Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PanoWan: Lifting Diffusion Video Generation Models to 360$^\circ$ with Latitude/Longitude-aware Mechanisms
Authors: Yifei Xia, Shuchen Weng, Siqi Yang, Jingqi Liu, Chengxuan Zhu, Minggui Teng, Zijian Jia, Han Jiang, Boxin Shi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Pano Wan achieves state-of-the-art panoramic video generation performance across seven metrics, alongside robust zero-shot capabilities for various downstream tasks. |
| Researcher Affiliation | Collaboration | 1State Key Lab of Multimedia Info. Processing, School of Computer Science, Peking University 2Nat l Eng. Research Ctr. of Visual Tech., School of Computer Science, Peking University 3Open Bayes Information Technology Co., Ltd. 4Beijing Academy of Artificial Intelligence 5Institute for Artificial Intelligence, Peking University 6Nat l Key Lab of General AI, School of Intelligence Science and Technology, Peking University 7School of Artificial Intelligence, Beijing University of Posts and Telecommunications EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Both the dataset and the codes will be released no latter than acceptance. |
| Open Datasets | No | Both the dataset and the codes will be released no latter than acceptance. |
| Dataset Splits | No | We evaluate Pano Wan against existing text-based panoramic video generation methods, including 360DVD [32] and Dynamic Scaler [12], on the PANOVID test split containing 67 non-overlapping clips. |
| Hardware Specification | Yes | Training is conducted on 8 NVIDIA H100 GPUs for approximately 18 hours. |
| Software Dependencies | No | Pano Wan is built on Wan 2.1-1.3B-T2V [33] as the video generation backbone. ... We employ Qwen-2.5-VL [2] to process each clip, generating a descriptive caption and predicting the associated POI (Point-of-Interest) category in a structured JSON format. ... The training process employs the Adam W optimizer [13] with a learning rate of 1e-4 and a batch size of 8. |
| Experiment Setup | Yes | We train Pano Wan at a resolution of 448 × 896, closely matching the pre-trained resolution of this backbone model. For parameter-efficient training, Lo RA [11] with a rank of 64 is applied to the query, key, value, and output projections of the attention mechanisms, as well as to the feed-forward networks. The model is trained for 200K iterations on our contributed PANOVID dataset. The training process employs the Adam W optimizer [13] with a learning rate of 1 × 10−4 and a batch size of 8. Training is conducted on 8 NVIDIA H100 GPUs for approximately 18 hours. During each iteration, clips of 81 consecutive frames are randomly sampled from the videos. |