On the Variance of the Fisher Information for Deep Learning
Authors: Alexander Soen, Ke Sun
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We investigate two such estimators based on two equivalent representations of the FIM both unbiased and consistent. Their estimation quality is naturally gauged by their variance given in closed form. We analyze how the parametric structure of a deep neural network can affect the variance. The meaning of this variance measure and its upper bounds are then discussed in the context of deep learning. Our central results, Theorems 4 and 6, present the variance of ˆI1(θ) and ˆI2(θ) in closed form, which is further extended to upper bounds in simpler forms. |
| Researcher Affiliation | Academia | Alexander Soen The Australian National University alexander.soen@anu.edu.au Ke Sun CSIRO s Data61, Australia The Australian National University sunk@ieee.org |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. It primarily presents mathematical derivations and theoretical analyses. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper focuses on theoretical analysis of statistical models and distributions (e.g., Bernoulli, Normal, Poisson) rather than conducting empirical experiments on specific, publicly available datasets. Therefore, it does not provide concrete access information for any dataset used in training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments with datasets that would require explicit training, validation, or test splits. |
| Hardware Specification | No | The paper mentions 'modern GPUs' in a general context regarding auto-differentiation frameworks, but it does not specify any particular hardware (e.g., GPU models, CPU types, memory) used to conduct experiments within the scope of this paper. |
| Software Dependencies | No | The paper mentions 'AD frameworks such as PyTorch [26]' but does not specify version numbers for PyTorch or any other software dependencies, which would be necessary for reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical analysis, not on empirical experimentation. Therefore, it does not provide details about an experimental setup, such as hyperparameters or system-level training settings. |