Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
WISA: World simulator assistant for physics-aware text-to-video generation
Authors: Jing Wang, Ao Ma, Ke Cao, Jun Zheng, Jiasong Feng, Zhanjie Zhang, Wanyuan Pang, Xiaodan Liang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that WISA substantially improves the alignment of T2V models (such as Cog Video X and Wan2.1) with real-world physical laws, achieving notable gains on the Video Phy benchmark. Our data, code, and models are available in the https://wisav1.github.io/WISA/. Quantitative and qualitative experimental results demonstrate WISA and WISA-80K can greatly assist basic T2V models in producing videos that better align with real-world physical laws, while introducing only a 3.5% increase in parameter count and 5% inference time. |
| Researcher Affiliation | Collaboration | Jing Wang1,2 , Ao Ma2 , Ke Cao2 , Jun Zheng1, Jiasong Feng2, Zhanjie Zhang2, Wanyuan Pang3, Xiaodan Liang1,4,5 1Shenzhen Campus of Sun Yat-Sen University, 2360 AI Research, 3University of Science and Technology Beijing, 4Peng Cheng Laboratory, 5Guangdong Key Laboratory of Big Data Analysis and Processing, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose, mathematical equations (e.g., equations 1, 2, 3, 4), and architectural diagrams (e.g., Figure 4), but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | To preserve anonymity during the review process, we have not released the code and data yet. However, we commit to open-sourcing both the code and data, along with detailed instructions for reproducing the experiments, upon acceptance. (NeurIPS Paper Checklist, Item 5) |
| Open Datasets | No | To preserve anonymity during the review process, we have not released the code and data yet. However, we commit to open-sourcing both the code and data, along with detailed instructions for reproducing the experiments, upon acceptance. (NeurIPS Paper Checklist, Item 5) While we plan to release the code and model after the review process, at the time of submission no new assets are publicly released. Upon release, we will ensure that comprehensive documentation, licensing terms, and usage guidelines are provided in accordance with the NeurIPS guidelines. (NeurIPS Paper Checklist, Item 13) |
| Dataset Splits | No | The paper mentions training on the WISA-80K dataset but does not specify explicit training/validation/test splits for this dataset in the main text or supplementary materials. It mentions using prompts from Videophy [3] and Phy Gen Bench [23] for evaluation, which are external benchmarks, not splits of their own dataset. |
| Hardware Specification | Yes | All experiments are conducted on 8 A100 GPUs, each equipped with 80 GB of memory. (Supplementary Material A.3) |
| Software Dependencies | No | The paper mentions using "Py Scene Detect [29]" and "Qwen2.5-VL [35]" and "GPT-4o mini" but does not specify version numbers for these or any other software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | WISA is trained on our constructed WISA-80K dataset for 8,000 steps, using a learning rate of 2e-5 and a batch size of 16. For Cog Video X-5B, the video resolution is set to 480 720 with 49 frames per video, while for Wan2.1-14B, the resolution is 480 832 with 81 frames. We adopt Lo RA with a rank of 128 and an alpha of 16. (Supplementary Material A.3) |