Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters
Authors: Maxence Noble, Valentin De Bortoli, Arnaud Doucet, Alain Durmus
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our methodology can be applied in high-dimensional settings such as image interpolation and Bayesian fusion. In our experiments, we illustrate the performance of Tree DSB to compute entropic regularized Wasserstein barycenters for various tasks . |
| Researcher Affiliation | Academia | Maxence Noble CMAP, CNRS, École polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France Valentin De Bortoli Computer Science Department, ENS, CNRS, PSL University Arnaud Doucet Department of Statistics, University of Oxford, UK Alain Oliviero Durmus CMAP, CNRS, École polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France |
| Pseudocode | Yes | Algorithm 1 Tree DSB (Training) |
| Open Source Code | Yes | Code available at https://github.com/maxencenoble/tree-diffusion-schrodinger-bridge. |
| Open Datasets | Yes | We then turn to an image experiment using MNIST dataset (Le Cun, 1998). Here, we consider a logistic regression model applied to the wine dataset4 (d = 42) and proceed as follows. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, absolute counts, or references to predefined splits). |
| Hardware Specification | Yes | Our experiments ran on 1 Intel Xeon CPU Gold 6230 20 cores @ 2.1 Ghz CPU. Our experiments ran using 1 Nvidia A100. |
| Software Dependencies | No | The numerical experiments presented in Section 7 are obtained by our own Pytorch implementation...We optimize the networks with ADAM (Kingma & Ba, 2014)...However, specific version numbers for PyTorch or other libraries are not provided. |
| Experiment Setup | Yes | We consider 50 steps for the time discretization on [0, T]. We refer to Appendix G for details on the choice of the schedule, the architecture of the neural networks and the settings of our experiments. In our experiments, we set N = 50 and γ0 = 10 5. We optimize the networks with ADAM (Kingma & Ba, 2014) with learning rate 10 4 and momentum 0.9. For each of the networks, we set the batch size to 4,096 and the number of iterations to 10,000 for the synthetic datasets and 15,000 for the subset posterior aggregation task. In the case of the experiments related to MNIST dataset, we use a reduced UNET architecture based on Nichol & Dhariwal (2021), where we set the number of channels to 64 rather than 128. We implement an exponential moving average of network parameters across training iterations, with rate 0.999. Finally, we set the batch size to 256 and the number of training iterations to 30,000. |