ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

Authors: Chi-Wei Hsiao, Yu-Lun Liu, Cheng-Kun Yang, Sheng-Po Kuo, Kevin Jou, Chia-Ping Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive ablation studies for the proposed Cache KV mechanism and timestep-scaled identity loss are also conducted and reported. We construct FFHQ-Ref, a dataset consisting of 20,405 high-quality (HQ) face images with corresponding reference images, which can serve as both training and evaluation data for reference-based face restoration models.
Researcher Affiliation Collaboration Chi-Wei Hsiao1 Yu-Lun Liu2 Cheng-Kun Yang1 Sheng-Po Kuo1 Yucheun Kevin Jou1 Chia-Ping Chen1 1Media Tek 2National Yang Ming Chiao Tung University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No We plan to release our dataset and model after the paper s acceptance. They are not included in the current submission because we are waiting formal permission from the associated company.
Open Datasets Yes Lastly, we construct FFHQ-Ref, a dataset consisting of 20,405 high-quality (HQ) face images with corresponding reference images, which can serve as both training and evaluation data for reference-based face restoration models.
Dataset Splits Yes Finally, we identified 6,523 identities and divided them into three splits: a train split with 18,816 images of 6,073 identities, a validation split with 732 images of 300 identities , and a test split with 857 images of 150 identities.
Hardware Specification Yes Table 2: Comparison between Cache KV and other mechanisms for input reference images (run with five reference images on a single GTX 1080). and Table 6: Inference time with different numbers of reference images. Num refs Time@1080 Time@3090 and We trained the VQGAN for 200,000 iterations with batch size 32 on four A6000 GPUs for 7 days. We trained the LDM with only LQ condition for 500,000 iterations with batch size 40 on four A6000 GPUs for 7 days. We finetuned the Re F-LDM for 150,000 iterations with batch size 8 on four 3090 GPUs for 6 days.
Software Dependencies No The paper mentions models and frameworks used (e.g., VQGAN, LDM, ArcFace) but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA in the main text or appendices.
Experiment Setup Yes In our experiments, we adopt a 512x512 image resolution, fix the number of reference images to five, and set loss scale λtime ID to 0.1. During training, we synthesize input LQ images with σ, r, δ, and q sampled from [0, 16], [1, 32], [0, 20], and [30, 100], respectively. For inference, we use 100 DDIM [24] steps and a classifier-free-guidance [7] with a scale of 1.5 towards reference images. We provides more implementation details in the Appendix G.