Fast Differentiable Sorting and Ranking
Authors: Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we confirm that our approach is an order of magnitude faster than existing approaches and showcase two novel applications: differentiable Spearman s rank correlation coefficient and least trimmed squares. We present in this section our empirical findings. Num Py, JAX, Py Torch and Tensorflow versions of our sorting and ranking operators are available at https://github. com/google-research/fast-soft-sort/. 6.1. Top-k classification loss function Experimental setup. To demonstrate the effectiveness of our proposed soft rank operators as a drop-in replacement for exisiting ones, we reproduce the top-k classification experiment of Cuturi et al. (2019). |
| Researcher Affiliation | Industry | Mathieu Blondel 1 Olivier Teboul 1 Quentin Berthet 1 Josip Djolonga 1 1Google Research, Brain team. Correspondence to: Mathieu Blondel <mblondel@google.com>, Olivier Teboul <oliviert@google.com>, Quentin Berthet <qberthet@google.com>, Josip Djolonga <josipd@google.com>. |
| Pseudocode | No | The paper describes algorithms such as the PAV algorithm in text but does not include any formal pseudocode blocks or labeled algorithm figures. |
| Open Source Code | Yes | Num Py, JAX, Py Torch and Tensorflow versions of our sorting and ranking operators are available at https://github. com/google-research/fast-soft-sort/. |
| Open Datasets | Yes | We use the CIFAR-10 and CIFAR-100 datasets, with n = 10 and n = 100 classes, respectively. We consider datasets from the LIBSVM archive (Fan & Lin, 2011). |
| Dataset Splits | Yes | Following (Korba et al., 2018), we average over two 10-fold validation runs, in each of which we train on 90% and evaluate on 10% of the data. Within each repetition, we run an internal 5-fold cross-validation to grid-search for the best parameters. We hold out 20% of the data as test set and use the rest as training set. We artificially create outliers, by adding noise to a certain percentage of the training labels, using yi yi + e, where e N(0, 5 std(y)). We do not add noise to the test set. For hyper-parameter optimization, we use 5-fold cross-validation. |
| Hardware Specification | Yes | We run this experiment on top of Tensor Flow (Abadi et al., 2016) on a six core Intel Xeon W-2135 with 64 GBs of RAM and a Ge Force GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions software like Num Py, JAX, Py Torch, Tensorflow, and scikit-learn but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Following Cuturi et al. (2019), we use a vanilla CNN (4 Conv2D with 2 maxpooling layers, Re LU activation, 2 fully connected layers with batch norm on each), the ADAM optimizer (Kingma & Ba, 2014) with a constant step size of 10 4, and set k = 1. We use L-BFGS (Liu & Nocedal, 1989), with a maximum of 300 iterations. For hyper-parameter optimization, we use 5-fold cross-validation. We choose k from { 0.1n , 0.2n , . . . , 0.5n }, ε from 10 log-spaced values between 10 3 and 104, and τ from 5 linearly spaced values between 1.3 and 2. |