UniFL: Improve Latent Diffusion Model via Unified Feedback Learning
Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Weilin Huang, Shilei Wen, Lean Fu, Guanbin Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In-depth experiments and extensive user studies validate the superior performance of our method in enhancing generation quality and inference acceleration. For instance, Uni FL surpasses Image Reward by 17% user preference in terms of generation quality and outperforms LCM and SDXL Turbo by 57% and 20% general preference with 4-step inference. |
| Researcher Affiliation | Collaboration | Jiacheng Zhang1,2, Jie Wu2, , , Yuxi Ren2 Xin Xia2 Huafeng Kuang2 Pan Xie2 Jiashi Li2 Xuefeng Xiao2 Weilin Huang2 Shilei Wen2 Lean Fu2 Guanbin Li1,3 1Sun Yat-sen University 2Bytedance Inc 3Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1 Perceptual Feedback Learning (Pe FL) |
| Open Source Code | Yes | Project Page: https://uni-fl.github.io/ and All checkpoints and code to reproduce our results will be publicly available on our project page upon the acceptance of our paper. |
| Open Datasets | Yes | We utilized the COCO2017 [47] train split dataset with instance annotations and captions for structure optimization with Pe FL. Additionally, we collected the human preference dataset for the decoupled aesthetic feedback learning from diverse aspects (such as color, layout, detail, and lighting). 100,000 prompts are selected for aesthetic optimization from Diffusion DB [48] via active prompt selection. During the adversarial feedback learning, we use data from the aesthetic subset of LAION [49] with image aesthetic scores above 5. |
| Dataset Splits | Yes | We generate the 5K image with the prompt from the COCO2017 validation split to report the Fréchet Inception Distance (FID) [54] as the overall visual quality metric. |
| Hardware Specification | Yes | Our training per stage costs around 200 A100 GPU hours. |
| Software Dependencies | No | The paper mentions specific software components like 'SOLO', 'DDIM scheduler', 'Deep Lab-V3', 'Mask2Former', and a 'scene graph parser' (with a GitHub link), but it does not provide specific version numbers for these tools or any other key software libraries (e.g., Python, PyTorch/TensorFlow). |
| Experiment Setup | Yes | Training Setting. We utilize the SOLO [50] as the instance segmentation model. We utilize the DDIM [51] scheduler with a total of 20 inference steps. Ta = 10 and the optimization steps t [0, 5] during Pe FL training. For adversarial feedback learning, we initialize the adversarial reward model with the weight of the aesthetic preference reward model of details. During adversarial training, the optimization step is set to t [0, 20] encompassing the entire diffusion process. |