DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

Authors: Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Zhendong Mao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method can generate more text-coherent and ID-preserved images with negligible time overhead compared to the standard text-to-image generation process.
Researcher Affiliation Collaboration Zhuowei Chen1,2*, Shancheng Fang1, Wei Liu2, Qian He2, Mengqi Huang1, Zhendong Mao1 1 University of Science and Technology of China 2 Byte Dance {chenzw01, huangmq}@mail.ustc.edu.cn {zdmao, fangsc}@ustc.edu.cn {liuwei.jikun, heqian}@bytedance.com
Pseudocode No The paper describes the architecture and processes (e.g., M2ID Encoder, Self-Augmented Editability Learning) but does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing code for their method or a link to a code repository.
Open Datasets Yes Our experiments are conducted on the widely used FFHQ dataset (Karras, Laine, and Aila 2019), which contains 70k high-resolution human face images.
Dataset Splits No The paper states the total number of training examples and mentions a test set, but it does not specify any explicit validation dataset splits or percentages (e.g., '80/10/10 split' or specific sample counts for validation).
Hardware Specification Yes Our experiments are trained on a server with eight A100-80G GPUs, which takes about one day to complete our experiment.
Software Dependencies No The paper mentions using 'Stable Diffusion 2.1-base' but does not provide specific version numbers for other software dependencies or libraries (e.g., Python, PyTorch).
Experiment Setup Yes The learning rate and batch size are set to 5e 5 and 64. The encoder is trained for 60,000 iterations. The embedding regularization weight λ is set to 1e 4. ... We use the DDIM (Song, Meng, and Ermon 2020) sampler with 30 steps during inference. The guidance scale is set to 7.5.