Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
Authors: Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alexander Peysakhovich, Aldo Pacchiano, Jakob Foerster
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems. and 4 Experiments |
| Researcher Affiliation | Collaboration | Jack Parker-Holder University of Oxford Luke Metz Google Research, Brain Team Cinjon Resnick Hengyuan Hu Alistair Letcher Alex Peysakhovich Aldo Pacchiano Jakob Foerster and Correspondence to jackph@robots.ox.ac.uk , jnf@fb.com |
| Pseudocode | Yes | In Algorithm 1 we show pseudo code for RR and See Algorithm 2 in the Appendix (Sec. C) for pseudocode. |
| Open Source Code | Yes | making our code available and testing the method on toy environments are important measures in this direction. and To run this experiment, see the notebook at https://bit.ly/2Xv Em Zy. |
| Open Datasets | Yes | RR for Supervised Learning We applied approximate RR to MNIST and RR for Out of Distribution Generalization We test our extension of RR from Sec 3 on OOD generalization using Colored MNIST [3]. and MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010. |
| Dataset Splits | No | We first produce a finite set of diverse solutions using only the training set and then use the validation data to chose the best one from these. and test and train accuracy for MNIST. While validation data is mentioned, specific splits (percentages or counts) are not provided. |
| Hardware Specification | No | The paper does not contain specific details about the hardware used, such as GPU/CPU models or types. |
| Software Dependencies | No | Since these terms only rely on Hessian-Vector-products, they can be calculated efficiently for large scale DNNs in any modern auto-diff library, e.g. Pytorch [41], Tensorflow [1] or Jax [6]. No version numbers are specified. |
| Experiment Setup | Yes | Our hyperparameters for this experiment were: S = 236, = 0.00264, LRx = 0.000510, LRλ = 4.34e 6, batch size = 2236. and We run RR with a maximum budget of T = 105 iterations similarity δbreak = 0.95, and take only the top N = 6 in Get Ridges. |