Analyzing and Improving Optimal-Transport-based Adversarial Networks
Authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves a FID score of 2.51 on CIFAR-10 and 5.99 on Celeb A-HQ-256, outperforming unified OT-based adversarial approaches. In this section, we present qualitative and quantitative generation results of OT-based GANs on both toy and CIFAR-10 (Krizhevsky et al., 2009) datasets. |
| Researcher Affiliation | Academia | Jaemoo Choi Seoul National University toony42@snu.ac.kr Jaewoong Choi Korea Institute for Advanced Study chwj1475@kias.re.kr Myungjoo Kang Seoul National University mkang@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Unified training algorithm |
| Open Source Code | Yes | To ensure the reproducibility of this work, we submitted the source code in the supplementary materials. |
| Open Datasets | Yes | Experimental Results To examine the effect of the cost function c(x, y) = τ x y 2 2, we compare the models with τ = 0 (WGAN, UOTM w/o cost) and τ > 0 (OTM, UOTM) in Fig 1. When τ = 0, both WGAN and UOTM w/o cost exhibit a mode collapse problem. These models fail to fit all modes of the data distribution. On the other hand, WGAN-GP shows a mode mixture problem. WGAN-GP generates inaccurate samples that lie between the modes of data distribution. In contrast, when τ > 0, both OTM and UOTM avoid model collapse and mixture problems. In the initial stages of training, OTM succeeds in capturing all modes of data distribution, until training instability occurs due to loss fluctuation. UOTM achieves the best distribution fitting by exploiting the stability of g1, g2 as well. Moreover, Table 2 provides a quantitative assessment of the mode collapse problem on CIFAR-10. The results are consistent with our analysis on the Toy datasets (Fig 1). The recall metric assesses the mode coverage for each model. In this regard, the introduction of the cost function improves the recall metric for each model: from WGAN (0.02) to OTM (0.49) and from UOTM w/o cost (0.13) to UOTM (0.62). The precision metric evaluates the faithfulness of generated images for each model. UOTM w/o cost achieves the best precision score, but the recall metric is significantly lower than UOTM. This result shows that UOTM w/o cost exhibited the mode collapse problem. From these results, we interpret that the cost function c(x, y) plays a crucial role in preventing mode collapse by guiding the generator towards cost-minimizing pairs. |
| Dataset Splits | No | The paper mentions training on CIFAR-10 and Celeb A-HQ and evaluating performance using FID scores. However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper discusses network architectures and training parameters but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer with β1 = 0' and 'β1 = 0.5' for different models and a 'gradient clip of 0.1', but it does not specify version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages used. |
| Experiment Setup | Yes | For a generator, we passed z through two fully connected (FC) layers with a hidden dimension of 128, resulting in 128-dimensional embedding. We used a batch size of 128, and a learning rate of 2 × 10−4 and 10−4 for the generator and discriminator, respectively. We trained for 30K iterations. For WGANs and OTM, since they do not converge without any regularization, we set the regularization parameter λ = 5. Moreover, we used R1 regularization of λ = 0.2 for all methods and architectures. WGANs are known to show better performance with the optimizers without momentum term, thus, we use Adam optimizer with β1 = 0, for WGANs. Furthermore, since OTM has a similar algorithm to WGAN, we also use Adam optimizer with β1 = 0. Lastly, following Choi et al. (2023a), we use Adam optimizer with β1 = 0.5 for UOTM. Note that for all experiments, we use β2 = 0.9 for the optimizer. We used a gradient clip of 0.1 for WGAN. |