Differentiable Ranking and Sorting using Optimal Transport

Authors: Marco Cuturi, Olivier Teboul, Jean-Philippe Vert

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train a vanilla CNN (4 Conv2D with 2 max-pooling layers, Re LU activation, 2 fully connected layers, batchnorm on each) and a Resnet18 on CIFAR-10 and CIFAR-100. Fig. 4 and 5 report test-set classification accuracies / epochs.
Researcher Affiliation Industry Marco Cuturi Olivier Teboul Jean-Philippe Vert Google Research, Brain Team {cuturi,oliviert,jpvert}@google.com
Pseudocode Yes Algorithm 1: Sinkhorn Inputs: a, b, x, y, ε, h, η Cxy [h(yj xi)]ij; K e Cxy/ε, u = 1n; repeat v b/KT u, u a/Kv until (v KT u, b) < η; Result: u, v, K
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using 'the code kindly made available by the authors' for a specific experiment setup, referring to previous work, but not its own.
Open Datasets Yes We train a vanilla CNN... on CIFAR-10 and CIFAR-100. We use the MNIST experiment setup...
Dataset Splits No The paper mentions training and test sets but does not provide specific details on validation dataset splits (percentages, counts, or methodology).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions the 'ADAM optimizer' but does not specify software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes We use the ADAM optimizer with a constant stepsize set to 10 4. We used ε = 10 3, η = 10 3, a squared distance cost h(u) = u2 and a stepsize of 10 4 with the ADAM optimizer. We train a vanilla CNN (4 Conv2D with 2 max-pooling layers, Re LU activation, 2 fully connected layers, batchnorm on each) and a Resnet18... We use 100 epochs... We set ε = 0.005. Our proposal is to consider the soft τ quantile qε(x; τ, t) operator defined in (5), using for the filler weight t = 1/512. This is labelled as ε = 10 2. We consider the same regressor architecture, namely a 2 hidden layer NN with hidden layer size 64, ADAM optimizer and steplength 10 4.