Markovian Sliced Wasserstein Distances: Beyond Independent Projections

Authors: Khai Nguyen, Tongzheng Ren, Nhat Ho

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we compare MSW distances with previous SW variants in various applications such as gradient flows, color transfer, and deep generative modeling to demonstrate the favorable performance of the MSW1.
Researcher Affiliation Academia Khai Nguyen Department of Statistics and Data Sciences The University of Texas at Austin Austin, TX 78712 khainb@utexas.edu Tongzheng Ren Department of Computer Science The University of Texas at Austin Austin, TX 78712 tongzheng@utexas.edu Nhat Ho Department of Statistics and Data Sciences The University of Texas at Austin Austin, TX 78712 minhnhat@utexas.edu
Pseudocode Yes Algorithm 1 Max sliced Wasserstein distance
Open Source Code Yes Code for this paper is published at https://github.com/UT-Austin-Data-Science-Group/MSW.
Open Datasets Yes We compare MSW with previous baselines including SW, Max-SW, K-SW, and Max-K-SW on benchmark datasets: CIFAR10 (image size 32x32) [29], and Celeb A [36] (image size 64x64).
Dataset Splits No The paper mentions training, but does not provide explicit training/validation/test splits. It uses 'benchmark datasets' and 'standard image datasets' which implies standard splits, but these are not explicitly stated.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions using Adam [25] as an optimizer, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes In the experiments, we utilize the Euler scheme with 300 timesteps and the step size is 10^-3 to move the empirical distribution... For Max-SW, Max-K-SW, i MSW, and vi MSW, we use the learning rate parameter for projecting directions η = 0.1. ... The number of training iterations is set to 50000. We update the generator Gϕ each 5 iterations while we update the feature function Fγ every iteration. The mini-batch size m is set 128 in all datasets. The learning rate for Gϕ and Fγ is 0.0002 and the optimizer is Adam [25] with parameters (β1, β2) = (0, 0.9). We use the order p = 2 for all sliced Wasserstein variants.