Optimal Transport with Tempered Exponential Measures
Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments We provide experimental evidence to validate the results in the paper. For each case, we sample M uniformly between r0, 1s and also sample r and c randomly. Due to limited space, we defer some of the results to the appendix. t-Sinkhorn Distances We plot the relative cost of the tempered entropic regularized OT to the value of the unregularized (measured or expected) cost. For the experiment, we set n 64 and average over 20 trials. Figure 3 shows the relative expected cost for different t and λ. The relative cost decreases with larger λ, and the asymptotic value is closer to zero for t is closer to one, which is the case for the EOT. Convergence of Tempered OT We measure the number of steps to converge for the tempered entropic regularized OT problem using Sinkhorn s iterations for different values of t and λ. We stop Sinkhorn s iterations when the maximum absolute change in each coordinate of is less than 1e 10. For the experiment, we set n 64 and average over 100 trials. Figure 4 shows the number of iterations to converge along with the relative expected cost to the solution of the unregularized OT problem. Sparse Solutions We analyze the sparsity of the solution of the (unregularized) OT problem as well as the solutions of the regularized expected cost problem (8) for t P r1, 2q. Note that t 1 is equal to the EOT problem (1). We set n 32 and, for each case, set λ 6.0{t to offset the scaling factor in (19). In Figure 5, we show the non-zero values of the transport plans (more precisely, values larger than 1e 25). OT induces a sparse solution with 2n 1 63 non-zero components. On the other hand, the EOT (t 1) solution is fully dense, with 1024 non-zero components. The sparsity increased by increasing t P r1, 2q. In this case, the transport plan with t 1.9 has only 83 non-zero values. |
| Researcher Affiliation | Industry | Ehsan Amid1*, Frank Nielsen2, Richard Nock3, Manfred K. Warmuth3 1 Google Deep Mind 2 Sony Computer Science Laboratories Inc. 3 Google Research |
| Pseudocode | Yes | Algorithm 1: Sinkhorn(K, r, c) ... Algorithm 2: Regularized Optimal Transport with TEMs |
| Open Source Code | No | The paper provides a link to its arXiv preprint ('1The full article is available at https://arxiv.org/abs/2309.04015.'), but it does not contain an explicit statement about releasing source code for the methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | No | The paper states: 'For each case, we sample M uniformly between r0, 1s and also sample r and c randomly.' This indicates that the data used for experiments is generated synthetically and randomly, rather than being a pre-existing publicly available dataset that would require access information. |
| Dataset Splits | No | The paper describes random generation of input data for its experiments ('sample M uniformly between r0, 1s and also sample r and c randomly'). As such, it does not use or specify traditional training, validation, or test splits from a fixed dataset. |
| Hardware Specification | No | The paper mentions parameters like 'n 64' and 'n 32' for the experiments and states averaging over trials, but it does not specify any details about the hardware used, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper describes algorithms like Sinkhorn's, but it does not provide specific names or version numbers of any software dependencies, libraries, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow, or specific numerical solvers). |
| Experiment Setup | Yes | For each case, we sample M uniformly between r0, 1s and also sample r and c randomly. ... For the experiment, we set n 64 and average over 20 trials. ... We stop Sinkhorn s iterations when the maximum absolute change in each coordinate of is less than 1e 10. ... We set n 32 and, for each case, set λ 6.0{t to offset the scaling factor in (19). |