Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Text2Relight: Creative Portrait Relighting with Text Guidance
Authors: Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we demonstrate that our model outperforms existing text-guided image generation models, showing high-quality portrait relighting results with a strong generalization to unconstrained scenes. Experiments Datasets. For quantitative evaluation, we use our data simulation pipeline to synthesize the ground-truth data for textguided portrait relighting. Metrics. We utilize various metrics to measure the score between the generated images and ground truths. Baselines. We compare our model with IP2P (Brooks, Holynski, and Efros 2023), GLIDE (Nichol et al. 2021), and MGIE (Fu et al. 2023) in Table 1. User Study We conduct the user study to compare the user perceptual preference, involving 30 participants and 20 samples. Ablation Study We conduct an ablation study on data, mask condition, and crafted hierarchy. |
| Researcher Affiliation | Collaboration | Junuk Cha1,2*, Mengwei Ren2, Krishna Kumar Singh2, He Zhang2, Yannick Hold-Geoffroy2, Seunghyun Yoon2, Hyun Joon Jung2, Jae Shin Yoon2 , Seungryul Baek1 1 UNIST 2 Adobe Research |
| Pseudocode | No | The paper describes methods in paragraph text and illustrates pipelines with figures (e.g., Figure 3, Figure 4, Figure 5, Figure 6), but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project page https://junukcha.github.io/project/text2relight/. This link directs to a project page, not a direct source-code repository for the methodology described in the paper. The paper does not contain an explicit statement about releasing the source code. |
| Open Datasets | Yes | We use real-world portrait images from (Kvanchiani et al. 2023) as testing sets. For quantitative evaluation, we use our data simulation pipeline to synthesize the ground-truth data for textguided portrait relighting. |
| Dataset Splits | No | Overall, our data has 1.5M pairs of relighting images and associated text prompts. We create 400K and 800K data from OLAT images and a single image, respectively. We create 100K data pair for shadow removal from a lightstage and 200K data pair for light positioning using a single image. We use real-world portrait images from (Kvanchiani et al. 2023) as testing sets. The paper describes the total size of its synthetic data and a dataset used for testing, but does not provide specific training/validation/test splits (e.g., percentages or counts for each partition) for the main synthetic dataset used for training the Text2Relight model. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or other computing resources used for running experiments. |
| Software Dependencies | No | The paper mentions several models and tools like "Chat GPT", "latent consistency model (Luo et al. 2023)", "stable diffusion model (Rombach et al. 2022)", "Instruct Pix2pix (Brooks, Holynski, and Efros 2023)", "Arc Face (Deng et al. 2019)", "LLa VA (Liu et al. 2024)", and "CLIP vision encoder (Radford et al. 2021)", but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | No | The paper describes the model training objective: "LT2R(x) = ||ϵ fθ({zt, I, M}, t, T)(x)||2 2." and states that it repurposes a text-guided image editing model. It mentions "four denoising steps" for a latent consistency model used in the data synthesis pipeline. However, it does not provide specific experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer settings for the training of the Text2Relight model itself. |