DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation
Authors: Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Zhendong Mao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method can generate more text-coherent and ID-preserved images with negligible time overhead compared to the standard text-to-image generation process. |
| Researcher Affiliation | Collaboration | Zhuowei Chen1,2*, Shancheng Fang1, Wei Liu2, Qian He2, Mengqi Huang1, Zhendong Mao1 1 University of Science and Technology of China 2 Byte Dance {chenzw01, huangmq}@mail.ustc.edu.cn {zdmao, fangsc}@ustc.edu.cn {liuwei.jikun, heqian}@bytedance.com |
| Pseudocode | No | The paper describes the architecture and processes (e.g., M2ID Encoder, Self-Augmented Editability Learning) but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code for their method or a link to a code repository. |
| Open Datasets | Yes | Our experiments are conducted on the widely used FFHQ dataset (Karras, Laine, and Aila 2019), which contains 70k high-resolution human face images. |
| Dataset Splits | No | The paper states the total number of training examples and mentions a test set, but it does not specify any explicit validation dataset splits or percentages (e.g., '80/10/10 split' or specific sample counts for validation). |
| Hardware Specification | Yes | Our experiments are trained on a server with eight A100-80G GPUs, which takes about one day to complete our experiment. |
| Software Dependencies | No | The paper mentions using 'Stable Diffusion 2.1-base' but does not provide specific version numbers for other software dependencies or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | The learning rate and batch size are set to 5e 5 and 64. The encoder is trained for 60,000 iterations. The embedding regularization weight λ is set to 1e 4. ... We use the DDIM (Song, Meng, and Ermon 2020) sampler with 30 steps during inference. The guidance scale is set to 7.5. |