Low-rank Optimal Transport: Approximation, Statistics and Debiasing

Authors: Meyer Scetbon, Marco Cuturi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we illustrate experimentally our theoretical findings and show how our initialization provide practical improvements. For that purpose we consider 3 synthetic problems and one real world dataset to: (i) provide illustrations on the statistical rates of LOTr,c, (ii) exhibit the gradient flow of the debiased formulation DLOTr,c, (iii) use the clustering method induced by LOTr,c, and (iv) show the effect of the initialization.
Researcher Affiliation Collaboration Meyer Scetbon CREST, ENSAE meyer.scetbon@ensae.fr Marco Cuturi Apple and CREST, ENSAE cuturi@apple.com
Pseudocode No The paper describes the algorithms used (e.g., mirror descent scheme, Dykstra’s Algorithm) in text and mathematical formulations, but it does not contain a structured pseudocode or algorithm block.
Open Source Code No The paper includes a checklist at the end stating 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]', but the main body of the paper does not provide a direct link to a source-code repository or an explicit statement indicating the availability of the code (e.g., 'We release our code at...', or 'The source code is available in the supplementary material') for the methodology described.
Open Datasets Yes In this experiment, we consider the Newsgroup20 dataset [Pedregosa et al., 2011] constituted of texts and we embed them into distributions in 50D using the same pre-processing steps as in [Cuturi et al., 2022]. We compare different initialization when applying the algorithm of [Scetbon et al., 2021] to compare random texts viewed as distributions for multiple choices of rank r.
Dataset Splits No The paper states in its checklist that training details including data splits are provided in Section 7, but Section 7 primarily describes the datasets and experimental setup without explicitly specifying training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification Yes All experiments were run on a Mac Book Pro 2019 laptop.
Software Dependencies No The paper mentions using 'python package scikit-learn' and references 'JAX toolbox for all things Wasserstein' but does not provide specific version numbers for these or any other software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes We recommend to set such as global γ 2 [1, 10], and observe that this range works whatever the problem considered. ... In practice we fix ε = 1/10 and we then initialize LOTr,c using (Q, R) solution of (9) and g , QT 1r(= RT 1r).