PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

Authors: Kendong Liu, Zhiyu Zhu, Chuanhao Li, Hui LIU, Huanqiang Zeng, Junhui Hou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach, showing significant improvements in the alignment of inpainted images with human preference compared with stateof-the-art methods.
Researcher Affiliation Academia Kendong Liu1 , Zhiyu Zhu1 , Chuanhao Li2 , Hui Liu3 , Huanqiang Zeng4 , Junhui Hou1 1City University of Hong Kong 2Yale University 3Saint Francis University 4Huaqiao University
Pseudocode No The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code and dataset are publicly available at https://prefpaint.github.io.
Open Datasets Yes We first randomly selected 6,000, 4,000, 6,000, and 1,000 images with diverse content from ADE20K [56, 57], Image Net [58], KITTI [59], and Div2K [60, 61] datasets, respectively.
Dataset Splits Yes We partitioned the dataset in Sec. 4 into training, validation and testing sets, containing 12,000, 3,000 and 2,000 prompts (with 36,000, 9,000 and 6,000 images), respectively.
Hardware Specification Yes We trained the reward model with four NVIDIA Ge Force RTX 3090 GPUs, each equipped with 20GB of memory. With the trained reward model, we subsequently fine-tuned the latest diffusion-based image inpainting model, namely Runway [62], on four 40GB NVIDIA Ge Force RTX A6000 GPUs as our Pref Paint.
Software Dependencies No The paper mentions using a 'pre-trained CLIP (Vi T-B) checkpoint' and 'half-precision computations' but does not specify software versions for libraries like PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes We utilized a cosine schedule to adjust the learning rate. Notably, we achieved optimal preference accuracy by fixing 70% of the layers with a learning rate of 1e 5 and a batch size of 5. ... During fine-tuning, we employed half-precision computations with a learning rate of 2e 6, and a batch size of 16.