Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

Authors: Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Promising preliminary experiments complement our analysis. We provide preliminary empirical evidence of the effectiveness of the proposed approach. We present here experiments comparing the two Sinkhorn approximations empirically.
Researcher Affiliation Academia 1Department of Computer Science, University College London, London, UK. 2INRIA Département d informatique, École Normale Supérieure PSL Research University, Paris, France. 3Istituto Italiano di Tecnologia, Genova, Italy. 4Department of Electrical and Electronic Engineering, Imperial College, London, UK.
Pseudocode Yes Algorithm 1 Computation of a Sλ(a, b)
Open Source Code Yes The implementation of this comparison is available online1. 1https://github.com/Giuls Lu/OT-gradients
Open Datasets Yes Google Quick Draw. We compared the performance of the two estimators on a challenging dataset. We selected c = 2, 4, 10 classes from the Google Quick Draw dataset [38] which consists in images of size 28 28 pixels. [38] Inc Google. Quick Draw Dataset. https://github.com/googlecreativelab/quickdraw-dataset.
Dataset Splits Yes We trained the structured prediction estimators on 1000 images per class and tested on other 1000 images. We used a Gaussian kernel with bandwith σ and regularization parameter γ selected by cross-validation.
Hardware Specification Yes Experiments were run on an Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz with 16GB RAM.
Software Dependencies No The paper mentions using 'efficient off-the-shelf implementations (BLAS, LAPACK)' and implies the use of Python for the linked code. However, it does not specify explicit version numbers for any software dependencies.
Experiment Setup Yes We empirically chose the Sinkhorn regularization parameter λ to be the smallest value such that the output Tλ of the Sinkhorn algorithm would be within 10 6 from the transport polytope in 1000 iterations. We compared the gradient obtained with Alg. 1 and Automatic Differentiation (AD) on random histograms with different n (y axis), m (x axis), and reg. λ = 0.02.