Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FlexWorld: Progressively Expanding 3D Scenes for Flexible-View Exploration
Authors: Luxi Chen, Zihan Zhou, Min Zhao, Yikai Wang, Ge Zhang, Wenhao Huang, Hao Sun, Ji-Rong Wen, Chongxuan LI
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Flex World in generating high-quality novel view videos and flexible-view 3D scenes from single images, achieving superior visual quality under multiple popular metrics and datasets compared to existing state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Luxi Chen1,2,3 , Zihan Zhou1,2,3 , Min Zhao4, Yikai Wang5 , Ge Zhang6, Wenhao Huang6, Hao Sun1,2,3, Ji-Rong Wen1,2,3, Chongxuan Li1,2,3 1 Gaoling School of Artificial Intelligence, Renmin University of China 2 Beijing Key Laboratory of Research on Large Models and Intelligent Governance 3 Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE 4 Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch MLCenter, Tsinghua University 5 School of Artificial Intelligence, Beijing Normal University 6 Byte Dance Seed |
| Pseudocode | No | The paper describes the method and workflow using natural language descriptions and figures, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/ML-GSAI/Flex World. |
| Open Datasets | Yes | Table 3: Codebase. We provide the URL and licenses for the open-source assets we used. [...] Datasets [18] https://github.com/DL3DV-10K/Dataset CC BY-NC 4.0 license [16] https://google.github.io/realestate10k CC BY 4.0 license [17] https://www.tanksandtemples.org CC BY 4.0 license |
| Dataset Splits | Yes | To ensure fairness, we selected the Real Estate10K (RE10K) test dataset [16] and Tanks-and-Temples (Tanks) [17] datasets, which are separate from our training dataset, for evaluation. Following previous work [12, 10], we randomly selected 300 video clips with a sample stride ranging from 1 to 3 in the Real Estate10K2. In the Tanks-and-Temples dataset, we randomly sampled 100 video clips with a stride of 4 across 14 test scenes. [...] As for the training datasets, ... we obtained 10253 high-quality 3D scenes for training. |
| Hardware Specification | Yes | The model is trained at a resolution of 480 × 720, with a learning rate of 5e-5 and a batch size of 32, for a total of 5000 steps on 16 NVIDIA A800 80G GPUs. |
| Software Dependencies | Yes | To further enhance the visual quality of the generated scene, we adopt SDEdit [71] by rendering multi-view images I from fixed viewpoints, adding random noise, and applying a multi-step denoising process using the FLUX.1-dev [72] image diffusion model after the expansion of the scene. |
| Experiment Setup | Yes | The model is trained at a resolution of 480 × 720, with a learning rate of 5e-5 and a batch size of 32, for a total of 5000 steps on 16 NVIDIA A800 80G GPUs. We retain the default settings for other hyperparameters in the original I2V fine-tuning process. [...] The coefficients for the 3DGS loss function, specifically λ1, λSSIM, and λLPIPS, are set to 0.8, 0.2, and 0.3, respectively. |