Unpaired Image-to-Image Translation via Neural Schrödinger Bridge
Authors: Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, Jong Chul Ye
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on toy and practical image-to-image translation tasks demonstrate that UNSB opens up a new research direction for applying diffusion models to large scale unpaired translation tasks. Our contributions can be summarized as follows. We identify the cause behind the failure of previous SB methods for image-to-image translation as the curse of dimensionality. We empirically verify this by using a toy task as a sanity check for whether an OT-based method is robust to the curse of dimensionality. We propose UNSB, which formulates SB as Lagrangian formulation under the constraint on the KL divergence between the true target distribution and the model distribution. This leads to the composition of generators learned via adversarial learning that overcome the curse of dimensionality with advanced discriminators. UNSB improves upon the Denoising Diffusion GAN (Xiao et al., 2022) by enabling translation between arbitrary distributions. Furthermore, based on the comparison with other existing unpaired translation methods, we demonstrate that UNSB is indeed a generalization of them by overcoming their shortcomings. Evaluation. We use four benchmark datasets: Horse2Zebra, Map2Cityscape, Summer2Winter, and Map2Satellite. All images are resized into 256 256. We use the FID score (Heusel et al., 2017) and the KID score (Chen et al., 2020) to measure sample quality. Comparison results. We show quantitative results in Table 2, where we observe our model outperforms baseline methods in all datasets. In particular, out model largely outperformed early GAN-based translation methods. When compared to recent models such as CUT, our model still shows better scores. NOT suffers from degraded performance in all of the datasets. |
| Researcher Affiliation | Academia | Beomsu Kim 1 Gihyun Kwon 2 Kwanyoung Kim1 Jong Chul Ye1 Equal contribution 1Kim Jaechul Graduate School of AI, KAIST 2Department of Bio and Brain Engineering, KAIST {beomsu.kim,cyclomon,cubeyoung,jong.ye}@kaist.ac.kr |
| Pseudocode | No | The paper describes the algorithm steps in paragraph form and through figures, but does not include a formally labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | Code: https://github.com/cyclomon/UNSB |
| Open Datasets | Yes | Evaluation. We use four benchmark datasets: Horse2Zebra, Map2Cityscape, Summer2Winter, and Map2Satellite. All images are resized into 256 256. We use the FID score (Heusel et al., 2017) and the KID score (Chen et al., 2020) to measure sample quality. |
| Dataset Splits | No | The paper mentions using Horse2Zebra, Map2Cityscape, Summer2Winter, and Map2Satellite datasets but does not provide specific training/validation/test splits, only that all images are resized. It implies standard splits for benchmark datasets but does not explicitly state the proportions or methodology for splitting. |
| Hardware Specification | Yes | Training. All experiments are conducted on a single RTX3090 GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer and specific settings but does not provide version numbers for software libraries like PyTorch, TensorFlow, or Python itself. It only states: "We used official pre-trained models for EGSDE and Star GAN v2 and trained NOT from scratch using the official code." but no specific versions for the underlying frameworks. |
| Experiment Setup | Yes | Training. All experiments are conducted on a single RTX3090 GPU. On each dataset, we train our UNSB network for 400 epochs with batch size 1 and Adam optimizer with β1 = 0.5, β2 = 0.999, and initial learning rate 0.0002. Learning rate is decayed linearly to zero until the end of training. All images are resized into 256 256 and normalized into range [ 1, 1]. For SB training and simulation, we discretize the unit interval [0, 1] into 5 intervals with uniform spacing. We used λSB = λReg = 1 and τ = 0.01. To estimate the entropy loss, we used mutual information neural estimation method Belghazi et al. (2018). To incorporate timestep embedding and stochastic conditioning into out UNSB network, we used positional embedding and Ada IN layers, respectively, following the implementation of DDGAN Xiao et al. (2022). For I2I tasks, we used CUT loss as regularization. On Summer2Winter translation task, we used a pre-trained VGG16 network as our feature selection source, following the strategy in the previous work by Zheng et al. (2021). |