reproducibilityindex.ai

BackPACK: Packing more into Backprop

Authors: Felix Dangel, Frederik Kunstner, Philipp Hennig

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate the capabilities of BACKPACK, we use it to implement preconditioned gradient descent optimizers with diagonal approximations of the GGN and recent Kronecker factorizations KFAC (Martens & Grosse, 2015), KFLR, and KFRA (Botev et al., 2017). Our results show that the curvature approximations based on Monte-Carlo (MC) estimates of the GGN, the approach used by KFAC, give similar progress per iteration to their more accurate counterparts, but being much cheaper to compute. While the na ıve update rule we implement does not surpass ﬁrst-order baselines such as SGD with momentum and Adam (Kingma & Ba, 2015), its implementation with various curvature approximations is made straightforward. (Section 1.1) and We benchmark the overhead of BACKPACK on the CIFAR-10 and CIFAR-100 datasets, using the 3C3D network3 provided by DEEPOBS (Schneider et al., 2019) and the ALL-CNN-C4 network of Springenberg et al. (2015). The results are shown in Fig. 6. (Section 3)
Researcher Affiliation	Academia	Felix Dangel University of Tuebingen fdangel@tue.mpg.de Frederik Kunstner University of Tuebingen kunstner@cs.ubc.ca Philipp Hennig University of Tuebingen and MPI for Intelligent Systems, Tuebingen ph@tue.mpg.de
Pseudocode	No	No formally labeled pseudocode or algorithm block found. Figure 1 shows code snippets but is not a pseudocode block.
Open Source Code	Yes	we provide an implementation on top of PYTORCH, coined BACKPACK, available at https://f-dangel.github.io/backpack/.
Open Datasets	Yes	We benchmark the overhead of BACKPACK on the CIFAR-10 and CIFAR-100 datasets, using the 3C3D network3 provided by DEEPOBS (Schneider et al., 2019) and the ALL-CNN-C4 network of Springenberg et al. (2015).
Dataset Splits	Yes	The results shown in this work were obtained with the default strategy, favoring highest ﬁnal accuracy on the validation set. (Section C.1) and The best hyperparameter settings is chosen according to the ﬁnal accuracy on a validation set. (Section 4)
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models) were mentioned for running experiments.
Software Dependencies	No	No specific version numbers were provided for software dependencies. Only names like PYTORCH are mentioned.
Experiment Setup	Yes	Both the learning rate α and damping λ are tuned over the grid α ∈ {10−4, 10−3, 10−2, 10−1, 1}, λ ∈ {10−4, 10−3, 10−2, 10−1, 1, 10}. (Section C.2) and We use the same batch size (N = 128 for all problems, except N = 256 for ALL-CNN-C on CIFAR-100) as the baselines and the optimizers run for the identical number of epochs. (Section C.2)