reproducibilityindex.ai

Low-rank Optimal Transport: Approximation, Statistics and Debiasing

Authors: Meyer Scetbon, Marco Cuturi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we illustrate experimentally our theoretical ﬁndings and show how our initialization provide practical improvements. For that purpose we consider 3 synthetic problems and one real world dataset to: (i) provide illustrations on the statistical rates of LOTr,c, (ii) exhibit the gradient ﬂow of the debiased formulation DLOTr,c, (iii) use the clustering method induced by LOTr,c, and (iv) show the effect of the initialization.
Researcher Affiliation	Collaboration	Meyer Scetbon CREST, ENSAE meyer.scetbon@ensae.fr Marco Cuturi Apple and CREST, ENSAE cuturi@apple.com
Pseudocode	No	The paper describes the algorithms used (e.g., mirror descent scheme, Dykstra’s Algorithm) in text and mathematical formulations, but it does not contain a structured pseudocode or algorithm block.
Open Source Code	No	The paper includes a checklist at the end stating 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]', but the main body of the paper does not provide a direct link to a source-code repository or an explicit statement indicating the availability of the code (e.g., 'We release our code at...', or 'The source code is available in the supplementary material') for the methodology described.
Open Datasets	Yes	In this experiment, we consider the Newsgroup20 dataset [Pedregosa et al., 2011] constituted of texts and we embed them into distributions in 50D using the same pre-processing steps as in [Cuturi et al., 2022]. We compare different initialization when applying the algorithm of [Scetbon et al., 2021] to compare random texts viewed as distributions for multiple choices of rank r.
Dataset Splits	No	The paper states in its checklist that training details including data splits are provided in Section 7, but Section 7 primarily describes the datasets and experimental setup without explicitly specifying training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification	Yes	All experiments were run on a Mac Book Pro 2019 laptop.
Software Dependencies	No	The paper mentions using 'python package scikit-learn' and references 'JAX toolbox for all things Wasserstein' but does not provide specific version numbers for these or any other software components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We recommend to set such as global γ 2 [1, 10], and observe that this range works whatever the problem considered. ... In practice we fix ε = 1/10 and we then initialize LOTr,c using (Q, R) solution of (9) and g , QT 1r(= RT 1r).