M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
Authors: Elias Frantar, Eldar Kurtic, Dan Alistarh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental ValidationFor pruning, our implementation provides order-of-magnitude improvements over the block-wise approximation of [37] for classic benchmarks such as pruning Res Net50 and Mobile Net on the Image Net dataset. ... What is more, our preconditioned SGD (even without momentum) can be competitive in terms of validation accuracy with state-of-the-art optimizers on models of moderate size, including compact vision architectures and Transformer language models [42]. Its computational overheads are of 5% 55% relative to vanilla SGD on standard CNN architectures. |
| Researcher Affiliation | Collaboration | Elias Frantar IST Austria elias.frantar@ist.ac.at Eldar Kurtic IST Austria eldar.kurtic@ist.ac.at Dan Alistarh IST Austria & Neural Magic dan.alistarh@ist.ac.at |
| Pseudocode | No | The paper describes the algorithms in prose and equations but does not include a formally labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Implementations are available at [9] and [17]. |
| Open Datasets | Yes | We prune CNNs (Res Net-50 [15] and Mobile Net-V1 [16]) on the Image Net dataset [36]. |
| Dataset Splits | Yes | We prune CNNs (Res Net-50 [15] and Mobile Net-V1 [16]) on the Image Net dataset [36]. |
| Hardware Specification | Yes | Timing experiments are run on a machine with NVIDIA RTX 2080 Ti GPUs, a 48-core Intel CPU, and 512 GB of RAM. |
| Software Dependencies | No | Pytorch [34] implementations of a pruning and optimization library.Tensor Flow [1] is mentioned, but specific version numbers for these software dependencies are not provided in the main text. |
| Experiment Setup | Yes | Following [37], we used batched gradients (of size 16) as single samples inside the Fisher approximation. |