Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention

Authors: Qiang Xiang, Shuang Sun, Binglei Li, Dejia Song, Huaxia Li, Yibo Chen, Xu Tang, Yao Hu, Junping Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our Instance Assemble method achieves state-of-the-art performance under complex layout conditions, while exhibiting strong compatibility with diverse style Lo RA modules.
Researcher Affiliation Collaboration 1Shanghai Key Laboratory of Intelligent Information Processing, College of Computer Science and Artificial Intelligence, Fudan University 2Xiaohongshu Inc. 3Shanghai Innovation Institute EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods through textual descriptions and illustrative figures (e.g., Figure 2 for the pipeline), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and pretrained models are publicly available at https://github.com/Fire Red Team/Instance Assemble.
Open Datasets Yes Following conventional practice, we also evaluate on coarse-grained close-set L2I evaluation dataset COCO [31].
Dataset Splits No The paper specifies the total number of images and instances for evaluation datasets (e.g., "Layout SAM-Eval... containing 5k images and 19k instances", "Dense Layout... consists of 5k images and 90k instances"), but does not explicitly provide training/validation/test splits needed to reproduce the experiment. It mentions training on "Layout SAM [61]" but doesn't detail specific splits for that training.
Hardware Specification Yes All models are trained on Layout SAM [61] at 1024 1024 with Prodigy, for 380K iterations (batch size 2) on SD3-M and 300K iterations (batch size 1) on Flux.1-Dev, using 8 H800 GPUs (7 days for SD3-M; 5 days for Flux.1-Dev).
Software Dependencies No The paper mentions several models and tools used (e.g., SD3-Medium [15], Flux.1-Dev [4], Prodigy, RAM++ [24], Grounding DINO [35], Qwen2.5-VL [43]), but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA, which are necessary for reproducible environment setup.
Experiment Setup Yes All models are trained on Layout SAM [61] at 1024 1024 with Prodigy, for 380K iterations (batch size 2) on SD3-M and 300K iterations (batch size 1) on Flux.1-Dev, using 8 H800 GPUs (7 days for SD3-M; 5 days for Flux.1-Dev). The textual-only Instance Assemble is trained on SD3-Medium [15] and Flux.1-Dev [4] and the version with additional visual instance content is only trained on SD3-Medium. We freeze the pretrained MMDi T backbone and only adapt the Layout Encoder and Lo RA modules of Assemble-MMDi T. Assemble-MMDi T is initialized from pretrained weights, and Lo RA with rank=4 is applied.