Learning with Differentiable Pertubed Optimizers

Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally the performance of our approach on various tasks. and 5 Experiments We demonstrate the usefulness of perturbed maximizers in a supervised learning setting, as described in Section 4. We focus on a classification task and on two structured prediction tasks, label ranking and learning to predict shortest paths.
Researcher Affiliation Collaboration Quentin Berthet Google Research, Brain Team Paris, France qberthet@google.com Mathieu Blondel Google Research, Brain Team Paris, France mblondel@google.com Olivier Teboul Google Research, Brain Team Paris, France oliviert@google.com Marco Cuturi Google Research, Brain Team Paris, France cuturi@google.com Jean-Philippe Vert Google Research, Brain Team Paris, France jpvert@google.com Francis Bach INRIA DI, ENS, PSL Research University Paris francis.bach@inria.fr
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We will open-source a Python package allowing to turn any black-box solver into a differentiable function, in just a few lines of code.
Open Datasets Yes We use the perturbed argmax with Gaussian noise in an image classification task on the CIFAR-10 dataset. and We use the same 21 datasets as in [28, 14].
Dataset Splits Yes Results are averaged over 10-fold CV and parameters tuned by 5-fold CV.
Hardware Specification No The paper mentions "In our experiments on GPU" but does not specify any particular hardware models or specifications.
Software Dependencies No The paper mentions "a Python package" but does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes We train a vanilla-CNN with 10 network outputs that are the entries of θ, we minimize the Fenchel-Young loss between θi gwpxiq and yi, with different temperatures ε and number of perturbations M. and We optimize over 50 epochs with batches of size 70, temperature ε 1 and M 1 (single perturbation).