reproducibilityindex.ai

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

Authors: Alexandra Peste, Eugenia Iofinova, Adrian Vladu, Dan Alistarh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an extensive empirical investigation, showing that AC/DC provides consistently good results on a wide range of models and tasks (Res Net [28] and Mobile Nets [30] on the Image Net [49] / CIFAR [36] datasets, and Transformers [56, 10] on Wiki Text [42]), under standard values of the training hyper-parameters. Specifically, when executed on the same number of training epochs, our method outperforms all previous sparse training methods in terms of the accuracy of the resulting sparse model, often by significant margins.
Researcher Affiliation	Collaboration	Alexandra Peste IST Austria Eugenia Ioﬁnova IST Austria Adrian Vladu CNRS & IRIF Dan Alistarh IST Austria & Neural Magic
Pseudocode	Yes	Please see Algorithm 1 for pseudocode.
Open Source Code	Yes	The code is available at: https://github.com/IST-DASLab/ACDC.
Open Datasets	Yes	We tested AC/DC on image classiﬁcation tasks (CIFAR-100 [36] and Image Net [49]) and on language modelling tasks [42] using the Transformer-XL model [10].
Dataset Splits	No	The paper frequently mentions
Hardware Specification	No	The paper mentions
Software Dependencies	No	The paper states:
Experiment Setup	Yes	In all reported results, the models were trained for a ﬁxed number of 100 epochs, using SGD with momentum. We use a cosine learning rate scheduler and training hyper-parameters following [37], but without label smoothing. The models were trained and evaluated using mixed precision (FP16). ... For all results, the AC/DC training schedule starts with a warm-up phase of dense training for 10 epochs, after which we alternate between compression and de-compression every 5 epochs, until the last dense and sparse phase. It is beneﬁcial to allow these last two ﬁne-tuning phases to run longer: the last decompression phase runs for 10 epochs, whereas the ﬁnal 15 epochs are the compression ﬁne-tuning phase. We reset SGD momentum at the beginning of every decompression phase.