Diffusion-based Image Translation using disentangled style and content representation
Authors: Gihyun Kwon, Jong Chul Ye
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks. |
| Researcher Affiliation | Academia | Gihyun Kwon1, Jong Chul Ye2,1 Department of Bio and Brain Engineering1, Kim Jaechul Graduate School of AI2, KAIST cyclomon,jong.ye@kaist.ac.kr |
| Pseudocode | Yes | A.4 ALGORITHM For detailed explanation, we include Algorithm of our proposed image translation mathods in Algorithm 1. |
| Open Source Code | Yes | Our detailed implementation can be found in our official Git Hub repository.1 |
| Open Datasets | Yes | All experiments were performed using unconditional score model pre-trained with Imagenet 256 256 resolution datasets (Dhariwal & Nichol (2021)). For our quantitative results using text-guided image translation, we used two different datasets Animals and Landscapes. |
| Dataset Splits | No | The paper does not specify explicit train/validation/test dataset splits with percentages or counts, nor does it refer to predefined splits for training and validation purposes. |
| Hardware Specification | Yes | The generation process takes 40 seconds per image in single RTX 3090 unit. All experiments are conducted with single RTX3090 GPU, on the same hardware and software environment. |
| Software Dependencies | No | The paper mentions using pre-trained models and referring to official source code of other methods but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | In all the experiments, we used diffusion step of T = 60 and the resampling repetition of N = 10; therefore, the total of 70 diffusion reverse steps are used. For hyperparameters, we use λ1 = 200, λ2 = 100, λ3 = 2000, λ4 = 1000, λ5 = 200. For imageguided translation, we set λmse = 1.5. For our CLIP loss, we set λs = 0.4 and λi = 0.2. |