Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical Applications
Authors: Sloan Nietert, Ziv Goldfeld, Kengo Kato
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present several numerical experiments supporting the theoretical results established in the previous sections. We focus on p = 2 so that we can use the MMD form for d(σ) 2 . Code is provided at https://github.com/ sbnietert/smooth-Wp. First, we examine W(σ) 2 (µ, ˆµn) directly, with computations feasible for small sample sizes using the stochastic averaged gradient (SAG) method as proposed by (Genevay et al., 2016) and implemented by (Hallin et al., 2020). In Figure 1 (left), we take µ = Unif([ 1, 1]d) and estimate E[W(σ) 2 (ˆµn, µ)] averaged over 10 trials, for varied d and σ. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, NY 2School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 3Department of Statistics and Data Science, Cornell University, Ithaca, NY |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is provided at https://github.com/ sbnietert/smooth-Wp. |
| Open Datasets | No | The paper uses synthetic distributions (e.g., µ = Unif([ -1, 1]d), µ = Ns) from which samples are drawn, but does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset file. |
| Dataset Splits | No | The paper does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions methods and their implementations (e.g., stochastic averaged gradient method implemented by Hallin et al., 2020) but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In Figure 1 (left), we take µ = Unif([ 1, 1]d) and estimate E[W(σ) 2 (ˆµn, µ)] averaged over 10 trials, for varied d and σ. ... Distributions are computed using kernel density estimation over 50 trials and estimating µ by ˆµ1000. ... Plotted in Figure 3 are n-scaled scatter plots of the estimation errors, with 40 trials for each σ and n pair. ... using 1000 and 200 bootstrap samples for d = 1 and d = 2, respectively. The probability of rejecting the null hypothesis for varied significance levels and sample sizes is estimated by repeating the tests over 100 and 200 draws of the original samples, for d = 1 and d = 2 respectively. |