Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Analytic Insights into Structure and Rank of Neural Network Hessian Maps
Authors: Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47]; for various loss types: MSE, cross-entropy, cosh; across several initialization schemes: Glorot [48], uniform, orthogonal [21]. Procedure. To verify the prediction of our theoretical results, we perform an exact calculation of the rank by computing the full Hessian and the corresponding singular value decomposition (SVD). Results. We study how the rank varies as a function of the sample size N and the network architecture (for varying widths). Fig. 2 shows this for a linear network on CIFAR10 with MSE loss. |
| Researcher Affiliation | Academia | Sidak Pal Singh a,b, Gregor Bachmann a and Thomas Hofmanna,b a ETH Zรผrich b Max Planck ETH Center for Learning Systems |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The most recent version of our paper can be found at https://arxiv.org/abs/2106.16225 and the code corresponding to the experiments is located at https://github.com/dalab/hessian-rank. |
| Open Datasets | Yes | Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47] |
| Dataset Splits | No | The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., exact percentages or sample counts). It refers to using well-known datasets, which often have standard splits, but these are not detailed in the text. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47]; for various loss types: MSE, cross-entropy, cosh; across several initialization schemes: Glorot [48], uniform, orthogonal [21]. We use 2 hidden layers of size 30, 20 with Tanh activation on MNIST (Figure 3 caption). We train a linear network with hidden layers 50, 20, 20, 20 (Fig. 2a) and M , M (Fig. 2b, 2c). linear network with hidden layer sizes 25, 20, 15, trained using a linear teacher (Figure 5 caption). |