Trade-Offs of Diagonal Fisher Information Matrix Estimators

Authors: Alexander Soen, Ke Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We include an empirical analysis of NNs trained on MNIST. Notably, our analysis considers general multi-dimensional NN output. This extends the case studies of [37] which was limited to 1D distributions due to the limitations of their bounds (and their associated computational costs of dealing with a 4D tensor of the full covariance).
Researcher Affiliation Academia Alexander Soen The Australian National University RIKEN AIP alexander.soen@anu.edu.au Ke Sun CSIRO s Data61 The Australian National University Ke.Sun@data61.csiro.au
Pseudocode No The paper does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The NeurIPS Paper Checklist states: 'Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Only simple plots are presented in the paper. The first figure is a toy example of natural gradient descent with the FIM estimators. The other plots consider variance estimation and bounds of the paper in small scale networks.'
Open Datasets Yes We examine the MNIST classification task [21] (CC BY-SA 3.0) using multilayer perceptrons (MLP) with four densely connected layers, sigmoid activations, and a dropout layer.
Dataset Splits No A training set of 256 data points are sampled. At each iteration of NGD, we sample 4 random points from the training set for the update. The test loss is evaluated on a test set of 4096 data points sampled. The paper specifies training and test sets but does not mention a distinct validation set.
Hardware Specification No The paper mentions utilizing 'BackPACK' for calculations but does not provide specific hardware details such as GPU/CPU models, memory, or other computer specifications used for running experiments.
Software Dependencies No We note that to calculate the diagonal Hessians required for the bounds and empirical FIM calculations, we utilize the Back PACK [6] for Py Torch. The paper mentions software components but does not provide specific version numbers for them.
Experiment Setup Yes Natural gradient descent (NGD) is taken using both ˆI1(θ) and ˆI2(θ). The estimated FIM utilize only a single y | x sample for each input x. We use a learning rate of η = 0.01 over 256 epochs. A training set of 256 data points are sampled. At each iteration of NGD, we sample 4 random points from the training set for the update. The test loss is evaluated on a test set of 4096 data points sampled.