Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport
Authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate these properties empirically through experiments. Moreover, we study the theoretical upper-bound of divergence between distributions in UOT. Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on Celeb A-HQ-256. |
| Researcher Affiliation | Academia | Jaemoo Choi Seoul National University toony42@snu.ac.kr Jaewoong Choi Korea Institute for Advanced Study chwj1475@kias.re.kr Myungjoo Kang Seoul National University mkang@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Training algorithm of UOTM |
| Open Source Code | Yes | The code is available at https://github.com/Jae-Moo/UOTM. |
| Open Datasets | Yes | Toy data For all the Toy dataset experiments, we used the same generator and discriminator architectures. ... CIFAR-10 We utilized all 50,000 samples. ... Celeb A-HQ We used all 120,000 samples. |
| Dataset Splits | No | The paper does not explicitly state training, validation, and test splits for the datasets, only the total number of samples used for CIFAR-10 and Celeb A-HQ, and how toy data was composed. |
| Hardware Specification | Yes | Score SDE [69] takes more than 70 hours for training CIFAR-10, 48 hours for DDGAN [80], and 35-40 hours for RGM on four Tesla V100 GPUs. OTM takes approximately 30-35 hours to converge, while our model only takes about 25 hours. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and states that the implementation for the large model follows Choi et al. [13], but it does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | For all the Toy dataset experiments, we used the same generator and discriminator architectures. The dimension of the auxiliary variable z is set to one. For a generator, we passed z through two fully connected (FC) layers with a hidden dimension of 128, resulting in 128-dimensional embedding. ... We used a batch size of 256, a learning rate of 10^-4, and 2000 epochs. ... For the small model setting of UOTM, we employed the architecture of Balaji et al. [8]. ... We set a batch size of 128, 200 epochs, a learning rate of 2 x 10^-4 and 10^-4 for the generator and discriminator, respectively. Adam optimizer with β1 = 0.5, β2 = 0.9 is employed. Moreover, we use R1 regularization of λ = 0.2. |