Debiaser Beware: Pitfalls of Centering Regularized Transport Maps
Authors: Aram-Alexandre Pooladian, Marco Cuturi, Jonathan Niles-Weed
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These claims are validated experimentally on synthetic and real datasets, and should reopen the debate on whether debiasing is needed when using entropic OT. |
| Researcher Affiliation | Collaboration | Aram-Alexandre Pooladian 1 Marco Cuturi 2 Jonathan Niles-Weed 1 3 1Center for Data Science, New York University, New York, USA 2Google Research, currently at Apple 3Courant Institute of Mathematical Sciences, New York University, New York, USA. Correspondence to: Aram-Alexandre Pooladian <aramalexandre.pooladian@nyu.edu>. |
| Pseudocode | No | The paper presents mathematical formulations, proofs, and experimental results, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our numerical experiments were performed using Google Colab Pro, where our code is adapted from (Chizat et al., 2020) and is publicly available. |
| Open Datasets | Yes | We turn our attention to an application of map estimation using real-world data, where practitioners may not have a priori knowledge of a map even existing between the source and target measures. Such an example arises in (Demetci et al., 2021; Moriel et al., 2021; Schiebinger et al., 2019), where the task is to infer cellular evolution from population measurements. |
| Dataset Splits | Yes | Across a range of ε values, we perform the following experiment across 20 trials where the train/test split is 50/50. |
| Hardware Specification | No | Our numerical experiments were performed using Google Colab Pro. (This indicates a computing environment but lacks specific hardware details like GPU/CPU models or memory.) |
| Software Dependencies | No | We compute the (unregularized) W2 distance using the Python OT (POT) package (Flamary et al., 2021). (Mentions a package but no version number. No other software with specific versions are mentioned.) |
| Experiment Setup | Yes | For both d = 5 and d = 10, we have NS = 50000 as the number of points to approximate the MSE ˆT T0 2 L2(P ) via Monte-Carlo integration after having learned the maps. For fixed ε (we chose ε = 0.5 in both d = 5 and d = 10), we sample n points from the source (where n varies between 100 and 10000), and then re-sample n points from the source and map them according to T, effectively generating samples from the target distribution. We run this procedure 20 times to generate error bars on the plots. Across a range of ε values, we perform the following experiment across 20 trials where the train/test split is 50/50. |