Differentiable Ranking and Sorting using Optimal Transport
Authors: Marco Cuturi, Olivier Teboul, Jean-Philippe Vert
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a vanilla CNN (4 Conv2D with 2 max-pooling layers, Re LU activation, 2 fully connected layers, batchnorm on each) and a Resnet18 on CIFAR-10 and CIFAR-100. Fig. 4 and 5 report test-set classification accuracies / epochs. |
| Researcher Affiliation | Industry | Marco Cuturi Olivier Teboul Jean-Philippe Vert Google Research, Brain Team {cuturi,oliviert,jpvert}@google.com |
| Pseudocode | Yes | Algorithm 1: Sinkhorn Inputs: a, b, x, y, ε, h, η Cxy [h(yj xi)]ij; K e Cxy/ε, u = 1n; repeat v b/KT u, u a/Kv until (v KT u, b) < η; Result: u, v, K |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using 'the code kindly made available by the authors' for a specific experiment setup, referring to previous work, but not its own. |
| Open Datasets | Yes | We train a vanilla CNN... on CIFAR-10 and CIFAR-100. We use the MNIST experiment setup... |
| Dataset Splits | No | The paper mentions training and test sets but does not provide specific details on validation dataset splits (percentages, counts, or methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions the 'ADAM optimizer' but does not specify software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | We use the ADAM optimizer with a constant stepsize set to 10 4. We used ε = 10 3, η = 10 3, a squared distance cost h(u) = u2 and a stepsize of 10 4 with the ADAM optimizer. We train a vanilla CNN (4 Conv2D with 2 max-pooling layers, Re LU activation, 2 fully connected layers, batchnorm on each) and a Resnet18... We use 100 epochs... We set ε = 0.005. Our proposal is to consider the soft τ quantile qε(x; τ, t) operator defined in (5), using for the filler weight t = 1/512. This is labelled as ε = 10 2. We consider the same regressor architecture, namely a 2 hidden layer NN with hidden layer size 64, ADAM optimizer and steplength 10 4. |