Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Authors: Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alexander Peysakhovich, Aldo Pacchiano, Jakob Foerster

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems. and 4 Experiments
Researcher Affiliation Collaboration Jack Parker-Holder University of Oxford Luke Metz Google Research, Brain Team Cinjon Resnick Hengyuan Hu Alistair Letcher Alex Peysakhovich Aldo Pacchiano Jakob Foerster and Correspondence to jackph@robots.ox.ac.uk , jnf@fb.com
Pseudocode Yes In Algorithm 1 we show pseudo code for RR and See Algorithm 2 in the Appendix (Sec. C) for pseudocode.
Open Source Code Yes making our code available and testing the method on toy environments are important measures in this direction. and To run this experiment, see the notebook at https://bit.ly/2Xv Em Zy.
Open Datasets Yes RR for Supervised Learning We applied approximate RR to MNIST and RR for Out of Distribution Generalization We test our extension of RR from Sec 3 on OOD generalization using Colored MNIST [3]. and MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
Dataset Splits No We first produce a finite set of diverse solutions using only the training set and then use the validation data to chose the best one from these. and test and train accuracy for MNIST. While validation data is mentioned, specific splits (percentages or counts) are not provided.
Hardware Specification No The paper does not contain specific details about the hardware used, such as GPU/CPU models or types.
Software Dependencies No Since these terms only rely on Hessian-Vector-products, they can be calculated efficiently for large scale DNNs in any modern auto-diff library, e.g. Pytorch [41], Tensorflow [1] or Jax [6]. No version numbers are specified.
Experiment Setup Yes Our hyperparameters for this experiment were: S = 236, = 0.00264, LRx = 0.000510, LRλ = 4.34e 6, batch size = 2236. and We run RR with a maximum budget of T = 105 iterations similarity δbreak = 0.95, and take only the top N = 6 in Get Ridges.