Non-geodesically-convex optimization in the Wasserstein space

Authors: Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams, Petrus Mikkola, Marcelo Hartmann, Kai Puolamäki, Arto Klami

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform numerical sampling experiments from non-log-concave distributions: the Gaussian mixture distribution and the distance-to-set-prior [61] relaxed von Mises Fisher distribution. Both are log-DC and the latter has non-differentiable logarithmic probability density (see Appx. C). Fig. 1 presents the sampling results. Experiment details are in Appx. B and Appx. C1.
Researcher Affiliation Academia Hoang Phuc Hau Luu Hanlin Yu Bernardo Williams Petrus Mikkola Marcelo Hartmann Kai Puolamäki Arto Klami, Department of Computer Science, University of Helsinki
Pseudocode Yes Algorithm 1 Semi FB Euler for sampling (Appendix B) and Algorithm 2 FB Euler for sampling (Appendix B).
Open Source Code Yes Our code is available at https://github.com/MCS-hub/OW24
Open Datasets No The paper defines synthetic distributions (Gaussian mixture, relaxed von Mises Fisher) with parameters in Appendix C for its experiments, rather than using pre-existing public datasets. No specific link, DOI, or formal citation to a public dataset is provided.
Dataset Splits No The paper conducts numerical sampling experiments and describes training parameters (e.g., iterations, learning rates, batch size) but does not define explicit training, validation, or test dataset splits in the conventional supervised learning sense.
Hardware Specification No We perform numerical experiments in a high-performance computing cluster with GPU support. We allocate 8G memory for the experiments.
Software Dependencies No We use Python version 3.8.0. Our implementation is based on the code of [53] (MIT license) with the Dense ICNN architecture [41].
Experiment Setup Yes Experiment details We set K = 5 and randomly generate x1, x2, . . . , x5 R2. We set σ = 1. The initial distribution is µ0 = N(0, 16I). We use η = 0.1 for both FB Euler and semi FB Euler. We train both algorithms for 40 iterations using Adam optimizer with a batch size of 512 in which the first 20 iterations use a learning rate of 5 10 3 while the latter 20 iterations use 2 10 3. For the baseline ULA, we run 10000 chains in parallel for 4000 iterations with a learning rate of 10 3.