Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Photography Perspective Composition: Towards Aesthetic Perspective Recommendation

Authors: Lujian Yao, Siming Zheng, Xinbin Yuan, Zhuoxuan Cai, Pu Wu, Jinwei Chen, Bo Li, Peng-Tao Jiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments In this section, we evaluate our approach through two main components: photography perspective composition (PPC) and perspective quality assessment (PQA). ... Main Results. [Quantitative Results] As shown in Tab. 2, we demonstrate the performance of three I2V models in generating PPC videos. [Qualitative Results] We demonstrate the versatility of our approach across three representative scenarios. ... Table 3: Quantitative Result for PQA (Tab. (a) and Tab.(b)) and PPC (Tab. (c) and Tab. (d)).
Researcher Affiliation Industry Lujian Yao Siming Zheng Xinbin Yuan Zhuoxuan Cai Pu Wu Jinwei Chen Bo Li Peng-Tao Jiang # vivo Mobile Communication Co., Ltd EMAIL, EMAIL
Pseudocode No The paper includes mathematical equations (Eq. 1, Eq. 2) and describes methodological steps, but does not present any clearly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code No Answer: [No] Justification: The paper has not yet been open-sourced for data and code.
Open Datasets Yes We select multiple professional photography datasets, including datasets used in existing composition studies such as GAICD [51], SACD [47], FLMS [6], and FCDB [3]. Furthermore, to expand our data volume, we incorporate Unsplash (https://unsplash.com), currently the largest open-source professional photography dataset.
Dataset Splits Yes Stage ①: Unpaired Videos. This stage focuses on distinguishing video quality levels. We collected approximately 5K perspective transformation videos generated by 3D reconstruction models, with expert annotators identifying roughly 1.5K high-quality and 3.5K low-quality samples. To expand the dataset, we randomly paired each high-quality video with 10 low-quality ones, creating a 15K unpaired dataset.
Hardware Specification Yes This setup requires approximately 50 NVIDIA H20 GPU hours.
Software Dependencies No The paper mentions "Qwen2-VL-2B [40]" as a base model and "Lo RA [13]" as a technique but does not provide specific version numbers for these or any other software libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The training process is conducted with a batch of 32 and a learning rate of 2 10 6, with the model trained over two epochs.