Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ViViT: Curvature Access Through The Generalized Gauss-Newton’s Low-Rank Structure
Authors: Felix Dangel, Lukas Tatzel, Philipp Hennig
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this by conducting performance benchmarks and substantiate Vi Vi T s usefulness by studying the impact of noise on the GGN s structural properties during neural network training. |
| Researcher Affiliation | Academia | Felix Dangel EMAIL University of Tübingen, Tübingen, Germany Lukas Tatzel EMAIL University of Tübingen, Tübingen, Germany Philipp Hennig EMAIL University of Tübingen & MPI for Intelligent Systems, Tübingen, Germany |
| Pseudocode | No | The paper describes methods and processes through mathematical equations and textual explanations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We introduce approximations that allow a flexible trade-off between computational cost and accuracy, and provide a fully-featured efficient implementation in Py Torch (Paszke et al., 2019) on top of the Back PACK (Dangel et al., 2020) package at https://github.com/f-dangel/vivit. The code used for the experiments is available at https://github.com/f-dangel/vivit-experiments. |
| Open Datasets | Yes | Architectures include three deep convolutional neural networks from Deep OBS (Schneider et al., 2019) (2c2d on Fashion-MNIST, 3c3d on CIFAR-10 and All-CNN-C on CIFAR-100), as well as residual networks from He et al. (2016) on CIFAR-10 based on Idelbayev (2018) all are equipped with cross-entropy loss. |
| Dataset Splits | Yes | In experiments with fixed mini-batches the batch sizes correspond to Deep OBS default value for training where possible (CIFAR-10: N = 128, Fashion-MNIST: N = 128). The residual networks use a batch size of N = 128. On CIFAR-100 (trained with N = 256), we reduce the batch size to N = 64 to fit the exact computation on the full mini-batch, used as baseline, into memory. If the GGN approximation is evaluated on a subset of the mini-batch (sub), N/8 of the samples are used (as in Zhang et al. (2017)). |
| Hardware Specification | Yes | Results in this section were generated on a workstation with an Intel Core i7-8700K CPU (32 GB) and one NVIDIA Ge Force RTX 2080 Ti GPU (11 GB). |
| Software Dependencies | No | The paper mentions software like PyTorch and Back PACK and cites their original publications, but does not provide specific version numbers for these libraries used in the experiments. |
| Experiment Setup | Yes | We train the following Deep OBS (Schneider et al., 2019) architectures with SGD and Adam: 3c3d on CIFAR-10, 2c2d on Fashion-MNIST and All-CNN-C on CIFAR-100 all are equipped with cross-entropy loss. To ensure successful training, we use the hyperparameters from Dangel et al. (2020) (see Table S.3). We also train a residual network Res Net-32 He et al. (2016) with cross-entropy loss on CIFAR-10 with both SGD and Adam. For this, we use a batch size of 128 and train for 180 epochs. Momentum for SGD was fixed to 0.9, and Adam uses the default parameters (β1 = 0.9, β2 = 0.999, ϵ = 10 8). |