Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that Res Master sets a new benchmark for high-resolution image generation. In this section, we report the qualitative and quantitative results and ablation studies. |
| Researcher Affiliation | Collaboration | 1The University of Tokyo 2The Chinese University of Hong Kong 3Ant Group |
| Pseudocode | No | The paper describes the methodology in text and diagrams (Figure 3: "The overall framework of Res Master." and Figure 4: "The overall pipeline of Structural Guidance."), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code, nor does it include any links to a code repository. |
| Open Datasets | Yes | For the fair evaluation of model performance, we conduct quantitative experiments on the dataset of Laion-5B (Schuhmann et al. 2022) with a large amount of image-caption pairs. |
| Dataset Splits | Yes | We randomly sample 1K captions as the text prompts for the high-resolution image generation. Additionally, we randomly sample 10K images from Laion-5B as a real image set. |
| Hardware Specification | Yes | The inference time is performed on a single NVIDIA Tesla 40GA100 GPU. |
| Software Dependencies | No | The paper mentions using SDXL (Podell et al. 2023) and IP-Adapter (Ye et al. 2023) but does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | In the structural guidance module, we set the initial normalized cutoff frequency D0 to 1.0 to maintain control over object structures during the early stages of generation. Within the fine-grained guidance module, we utilize IP-Adapter (Ye et al. 2023) to inject image prompts into SDXL (Podell et al. 2023) as the condition. Herein, we set the weight of the image prompt λ to 0.8. Our framework is built on the patch-based diffusion model. We follow (Du et al. 2024) to partition the entire noise into patches with a specified size [1024, 1024] and stride [64,64]. |