Trade-Offs of Diagonal Fisher Information Matrix Estimators
Authors: Alexander Soen, Ke Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We include an empirical analysis of NNs trained on MNIST. Notably, our analysis considers general multi-dimensional NN output. This extends the case studies of [37] which was limited to 1D distributions due to the limitations of their bounds (and their associated computational costs of dealing with a 4D tensor of the full covariance). |
| Researcher Affiliation | Academia | Alexander Soen The Australian National University RIKEN AIP alexander.soen@anu.edu.au Ke Sun CSIRO s Data61 The Australian National University Ke.Sun@data61.csiro.au |
| Pseudocode | No | The paper does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The NeurIPS Paper Checklist states: 'Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Only simple plots are presented in the paper. The first figure is a toy example of natural gradient descent with the FIM estimators. The other plots consider variance estimation and bounds of the paper in small scale networks.' |
| Open Datasets | Yes | We examine the MNIST classification task [21] (CC BY-SA 3.0) using multilayer perceptrons (MLP) with four densely connected layers, sigmoid activations, and a dropout layer. |
| Dataset Splits | No | A training set of 256 data points are sampled. At each iteration of NGD, we sample 4 random points from the training set for the update. The test loss is evaluated on a test set of 4096 data points sampled. The paper specifies training and test sets but does not mention a distinct validation set. |
| Hardware Specification | No | The paper mentions utilizing 'BackPACK' for calculations but does not provide specific hardware details such as GPU/CPU models, memory, or other computer specifications used for running experiments. |
| Software Dependencies | No | We note that to calculate the diagonal Hessians required for the bounds and empirical FIM calculations, we utilize the Back PACK [6] for Py Torch. The paper mentions software components but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Natural gradient descent (NGD) is taken using both ˆI1(θ) and ˆI2(θ). The estimated FIM utilize only a single y | x sample for each input x. We use a learning rate of η = 0.01 over 256 epochs. A training set of 256 data points are sampled. At each iteration of NGD, we sample 4 random points from the training set for the update. The test loss is evaluated on a test set of 4096 data points sampled. |