Sliced Wasserstein with Random-Path Projecting Directions
Authors: Khai Nguyen, Shujian Zhang, Tam Le, Nhat Ho
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare the proposed RPSW and IWRPSW to the existing SW variants such as SW, Max SW, DSW, and EBSW in gradient flow in Section 4.1 and training denoising diffusion models in 4.3.We show both the qualitative visualization and quantitative comparison (in Wasserstein-2 distance (Flamary et al., 2021)) in Figure 1. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Sciences, University of Texas at Austin, USA 2Department of Advanced Data Science, The Institute of Statistical Mathematics (ISM), Japan 3RIKEN AIP, Japan. |
| Pseudocode | Yes | Algorithm 1 Computational algorithm of RPSW |
| Open Source Code | Yes | 1Code for this paper is published at https://github. com/khainb/RPSW. |
| Open Datasets | Yes | Utilizing MNIST dataset (Le Cun et al., 1998), we select images of digit 1 to construct the source distribution and images of digit 0 to construct the target distribution.Following the setting in (Xiao et al., 2021) for diffusion models on CIFAR10 (Krizhevsky et al., 2009) with N = 1800 epochs. |
| Dataset Splits | No | The paper mentions using standard datasets like MNIST and CIFAR10 but does not explicitly provide specific training, validation, and test split percentages or sample counts for reproducibility within the text. |
| Hardware Specification | Yes | For the gradient flow experiments, we use a HP Omen 25L desktop for conducting experiments. For diffusion model experiments, we use a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not explicitly list specific software components with their version numbers required for reproduction (e.g., Python 3.x, PyTorch x.x, CUDA x.x). |
| Experiment Setup | Yes | We set the total number of projections for SW variants to 10. For DSW, RPSW, and IWRPSW, we set the concentration parameter κ of the PS distribution as a dynamic quantity i.e., κ(t) = (κ0 1) N t 1 / (N 1) + 10 / N with N = 300 and κ0 {100, 50}.We set L = 104 for SW, DSW, EBSW, RPSW, and IWRPSW. We set T {2, 5, 10} for DSW and T {100, 1000} for Max-SW.We set initial learning rate for discriminator to 10 4, initial learning rate for generator to 1.6 10 4, Adam optimizer with parameters (0.5, 0.9), EMA to 0.9999, batch-size to 256. For the learning rate scheduler, we use cosine learning rate decay. |