Analytic Insights into Structure and Rank of Neural Network Hessian Maps

Authors: Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47]; for various loss types: MSE, cross-entropy, cosh; across several initialization schemes: Glorot [48], uniform, orthogonal [21]. Procedure. To verify the prediction of our theoretical results, we perform an exact calculation of the rank by computing the full Hessian and the corresponding singular value decomposition (SVD). Results. We study how the rank varies as a function of the sample size N and the network architecture (for varying widths). Fig. 2 shows this for a linear network on CIFAR10 with MSE loss.
Researcher Affiliation Academia Sidak Pal Singh a,b, Gregor Bachmann a and Thomas Hofmanna,b a ETH Zürich b Max Planck ETH Center for Learning Systems
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The most recent version of our paper can be found at https://arxiv.org/abs/2106.16225 and the code corresponding to the experiments is located at https://github.com/dalab/hessian-rank.
Open Datasets Yes Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47]
Dataset Splits No The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., exact percentages or sample counts). It refers to using well-known datasets, which often have standard splits, but these are not detailed in the text.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes Setup. We test our results on a variety of datasets: MNIST [45], Fashion MNIST [46], CIFAR10 [47]; for various loss types: MSE, cross-entropy, cosh; across several initialization schemes: Glorot [48], uniform, orthogonal [21]. We use 2 hidden layers of size 30, 20 with Tanh activation on MNIST (Figure 3 caption). We train a linear network with hidden layers 50, 20, 20, 20 (Fig. 2a) and M , M (Fig. 2b, 2c). linear network with hidden layer sizes 25, 20, 15, trained using a linear teacher (Figure 5 caption).