Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that Res Master sets a new benchmark for high-resolution image generation. In this section, we report the qualitative and quantitative results and ablation studies. |
| Researcher Affiliation | Collaboration | 1The University of Tokyo 2The Chinese University of Hong Kong 3Ant Group |
| Pseudocode | No | The paper describes the methodology in text and diagrams (Figure 3: "The overall framework of Res Master." and Figure 4: "The overall pipeline of Structural Guidance."), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code, nor does it include any links to a code repository. |
| Open Datasets | Yes | For the fair evaluation of model performance, we conduct quantitative experiments on the dataset of Laion-5B (Schuhmann et al. 2022) with a large amount of image-caption pairs. |
| Dataset Splits | Yes | We randomly sample 1K captions as the text prompts for the high-resolution image generation. Additionally, we randomly sample 10K images from Laion-5B as a real image set. |
| Hardware Specification | Yes | The inference time is performed on a single NVIDIA Tesla 40GA100 GPU. |
| Software Dependencies | No | The paper mentions using SDXL (Podell et al. 2023) and IP-Adapter (Ye et al. 2023) but does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | In the structural guidance module, we set the initial normalized cutoff frequency D0 to 1.0 to maintain control over object structures during the early stages of generation. Within the fine-grained guidance module, we utilize IP-Adapter (Ye et al. 2023) to inject image prompts into SDXL (Podell et al. 2023) as the condition. Herein, we set the weight of the image prompt λ to 0.8. Our framework is built on the patch-based diffusion model. We follow (Du et al. 2024) to partition the entire noise into patches with a specified size [1024, 1024] and stride [64,64]. |