reproducibilityindex.ai

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose M. Alvarez

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Ada SAP improves the robust accuracy of pruned models on image classification by up to +6% on Image Net C and +4% on Image Net V2, and on object detection by +4% on a corrupted Pascal VOC dataset, over a wide range of compression ratios, pruning criteria, and network architectures, outperforming recent pruning art by large margins.
Researcher Affiliation	Collaboration	Anna Bair Carnegie Mellon University abair@cmu.edu Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose Alvarez NVIDIA {dannyy, mshen, pmolchanov, josea}@nvidia.com
Pseudocode	Yes	Algorithm 1 Ada SAP Optimization Iteration Algorithm 2 Ada SAP Pruning Procedure
Open Source Code	No	No, the paper does not provide an explicit statement about releasing code or a link to a code repository.
Open Datasets	Yes	For image classification, we train on Image Net-1K (Deng et al., 2009) and additionally evaluate on Image Net-C (Hendrycks & Dietterich, 2019) and Image Net-V2 (Recht et al., 2019). For object detection, we use the Pascal VOC dataset (Everingham et al., 2009).
Dataset Splits	Yes	For image classification, we report the Top1 accuracy on each dataset and two robustness ratios, defined as the ratio in robust accuracy to validation accuracy: RC = acc C/accval and RV2 = acc V2/accval.
Hardware Specification	Yes	We perform Distributed Data Parallel training across 8 V100 GPUs with batch size 128 for all experiments.
Software Dependencies	No	No, the paper describes the optimizer (SGD with cosine annealing learning rate, momentum, weight decay) and certain hyperparameter values, but does not list specific software dependencies with version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup	Yes	The base optimizer is SGD with cosine annealing learning rate with a linear warmup over 8 epochs, a largest learning rate of 1.024, momentum of 0.875, and weight decay 3.05e 05. Unless otherwise stated we use ρmin = 0.01 and ρmax = 2.0 for all experiments... We run the warm up for 10 epochs, and then we follow the same pruning schedule... We fine-tune the pruned model for another 79 epochs (to reach 90 epochs total).