The Memory-Perturbation Equation: Understanding Model's Sensitivity to Data
Authors: Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. |
| Researcher Affiliation | Academia | Peter Nickl peter.nickl@riken.jp Lu Xu lu.xu.sw@riken.jp Dharmesh Tailor d.v.tailor@uva.nl Thomas Möllenhoff thomas.moellenhoff@riken.jp Mohammad Emtiyaz Khan emtiyaz.khan@riken.jp RIKEN Center for AI Project, Tokyo, Japan. University of Amsterdam, Amsterdam, Netherlands. |
| Pseudocode | No | The information is insufficient. The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All details of the experimental setup are included in App. I and the code is available at https://github.com/team-approx-bayes/memory-perturbation. |
| Open Datasets | Yes | We show results for three datasets, each using a different architecture but all trained using SGD. To estimate the Hessian H and compute vi = fi(θ ) H 1 fi(θ ), we use a Kronecker-factored (K-FAC) approximation implemented in the laplace [11] and ASDL [39] packages. |
| Dataset Splits | Yes | The approximation eliminates the need to train N models to perform CV, rather just uses eit and vit which are extremely cheap to compute within algorithms such as ON, RMSprop, and SGD. Leave-group-out (LGO) estimates can also be built, for example, by using Eq. 14, which enables us to understand the effect of leaving out a big chunk of training data, for example, an entire class for classification. |
| Hardware Specification | No | The information is insufficient. The paper does not specify the exact hardware (e.g., GPU/CPU models, specific cloud instances) used for running the experiments. |
| Software Dependencies | No | The information is insufficient. The paper mentions using 'laplace [11] and ASDL [39] packages' but does not specify their version numbers. |
| Experiment Setup | Yes | All details of the experimental setup are included in App. I and the code is available at https://github.com/team-approx-bayes/memory-perturbation. |