Learning with Differentiable Pertubed Optimizers
Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally the performance of our approach on various tasks. and 5 Experiments We demonstrate the usefulness of perturbed maximizers in a supervised learning setting, as described in Section 4. We focus on a classification task and on two structured prediction tasks, label ranking and learning to predict shortest paths. |
| Researcher Affiliation | Collaboration | Quentin Berthet Google Research, Brain Team Paris, France qberthet@google.com Mathieu Blondel Google Research, Brain Team Paris, France mblondel@google.com Olivier Teboul Google Research, Brain Team Paris, France oliviert@google.com Marco Cuturi Google Research, Brain Team Paris, France cuturi@google.com Jean-Philippe Vert Google Research, Brain Team Paris, France jpvert@google.com Francis Bach INRIA DI, ENS, PSL Research University Paris francis.bach@inria.fr |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We will open-source a Python package allowing to turn any black-box solver into a differentiable function, in just a few lines of code. |
| Open Datasets | Yes | We use the perturbed argmax with Gaussian noise in an image classification task on the CIFAR-10 dataset. and We use the same 21 datasets as in [28, 14]. |
| Dataset Splits | Yes | Results are averaged over 10-fold CV and parameters tuned by 5-fold CV. |
| Hardware Specification | No | The paper mentions "In our experiments on GPU" but does not specify any particular hardware models or specifications. |
| Software Dependencies | No | The paper mentions "a Python package" but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We train a vanilla-CNN with 10 network outputs that are the entries of θ, we minimize the Fenchel-Young loss between θi gwpxiq and yi, with different temperatures ε and number of perturbations M. and We optimize over 50 epochs with batches of size 70, temperature ε 1 and M 1 (single perturbation). |