Debiaser Beware: Pitfalls of Centering Regularized Transport Maps

Authors: Aram-Alexandre Pooladian, Marco Cuturi, Jonathan Niles-Weed

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These claims are validated experimentally on synthetic and real datasets, and should reopen the debate on whether debiasing is needed when using entropic OT.
Researcher Affiliation Collaboration Aram-Alexandre Pooladian 1 Marco Cuturi 2 Jonathan Niles-Weed 1 3 1Center for Data Science, New York University, New York, USA 2Google Research, currently at Apple 3Courant Institute of Mathematical Sciences, New York University, New York, USA. Correspondence to: Aram-Alexandre Pooladian <aramalexandre.pooladian@nyu.edu>.
Pseudocode No The paper presents mathematical formulations, proofs, and experimental results, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our numerical experiments were performed using Google Colab Pro, where our code is adapted from (Chizat et al., 2020) and is publicly available.
Open Datasets Yes We turn our attention to an application of map estimation using real-world data, where practitioners may not have a priori knowledge of a map even existing between the source and target measures. Such an example arises in (Demetci et al., 2021; Moriel et al., 2021; Schiebinger et al., 2019), where the task is to infer cellular evolution from population measurements.
Dataset Splits Yes Across a range of ε values, we perform the following experiment across 20 trials where the train/test split is 50/50.
Hardware Specification No Our numerical experiments were performed using Google Colab Pro. (This indicates a computing environment but lacks specific hardware details like GPU/CPU models or memory.)
Software Dependencies No We compute the (unregularized) W2 distance using the Python OT (POT) package (Flamary et al., 2021). (Mentions a package but no version number. No other software with specific versions are mentioned.)
Experiment Setup Yes For both d = 5 and d = 10, we have NS = 50000 as the number of points to approximate the MSE ˆT T0 2 L2(P ) via Monte-Carlo integration after having learned the maps. For fixed ε (we chose ε = 0.5 in both d = 5 and d = 10), we sample n points from the source (where n varies between 100 and 10000), and then re-sample n points from the source and map them according to T, effectively generating samples from the target distribution. We run this procedure 20 times to generate error bars on the plots. Across a range of ε values, we perform the following experiment across 20 trials where the train/test split is 50/50.