M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Authors: Elias Frantar, Eldar Kurtic, Dan Alistarh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental ValidationFor pruning, our implementation provides order-of-magnitude improvements over the block-wise approximation of [37] for classic benchmarks such as pruning Res Net50 and Mobile Net on the Image Net dataset. ... What is more, our preconditioned SGD (even without momentum) can be competitive in terms of validation accuracy with state-of-the-art optimizers on models of moderate size, including compact vision architectures and Transformer language models [42]. Its computational overheads are of 5% 55% relative to vanilla SGD on standard CNN architectures.
Researcher Affiliation Collaboration Elias Frantar IST Austria elias.frantar@ist.ac.at Eldar Kurtic IST Austria eldar.kurtic@ist.ac.at Dan Alistarh IST Austria & Neural Magic dan.alistarh@ist.ac.at
Pseudocode No The paper describes the algorithms in prose and equations but does not include a formally labeled pseudocode or algorithm block.
Open Source Code Yes Implementations are available at [9] and [17].
Open Datasets Yes We prune CNNs (Res Net-50 [15] and Mobile Net-V1 [16]) on the Image Net dataset [36].
Dataset Splits Yes We prune CNNs (Res Net-50 [15] and Mobile Net-V1 [16]) on the Image Net dataset [36].
Hardware Specification Yes Timing experiments are run on a machine with NVIDIA RTX 2080 Ti GPUs, a 48-core Intel CPU, and 512 GB of RAM.
Software Dependencies No Pytorch [34] implementations of a pruning and optimization library.Tensor Flow [1] is mentioned, but specific version numbers for these software dependencies are not provided in the main text.
Experiment Setup Yes Following [37], we used batched gradients (of size 16) as single samples inside the Fisher approximation.