Don’t just prune by magnitude! Your mask topology is a secret weapon

Authors: Duc Hoang, Souvik Kundu, Shiwei Liu, Zhangyang "Atlas" Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To generate and evaluate results in this section, we utilize the ITOP framework [13] for its reliable performance in monitoring sparse structures and their evolving graph characteristics. Using magnitude pruning and random growth (initial renewal rate of 50%), we sample model masks and weights every 1500 iterations over 250 epochs. After training these unique sparse models, we assess their graph characteristics to base our subsequent observations. Notably, all our sparse subnetworks maintain a 99% unstructured sparsity (only 1% trained weights remain non-zero).
Researcher Affiliation Collaboration Duc Hoang University of Texas at Austin hoangduc@utexas.edu Souvik Kundu Intel Labs souvikk.kundu@intel.com Shiwei Liu Eindhoven University of Technology University of Texas at Austin s.liu3@tue.nl Zhangyang Wang Univerity of Texas at Austin atlaswang@utexas.edu
Pseudocode Yes Algorithm 1: Pruning at Initialization as Graph Sampling (PAGS)
Open Source Code Yes Codes can be found at: https://github.com/VITA-Group/Full Spectrum-PAI.
Open Datasets Yes We demonstrate results on CIFAR-10/ CIFAR-100 in the main text. We use two representative models, Resnet18 and Res Net34, as the main backbones in this section. Additional training details and results on Tiny-Imange Net are deferred to the Supplementary due to the space limit. The CIFAR-10 dataset consists of 50,000 training images and 10,000 testing images, divided into 10 different classes. The Tiny-Imagenet dataset consists of 100,000 images, divided into 200 different classes.
Dataset Splits Yes The CIFAR-10 dataset consists of 50,000 training images and 10,000 testing images, divided into 10 different classes. The Tiny-Imagenet dataset consists of 100,000 images, divided into 200 different classes.
Hardware Specification No The paper does not mention any specific hardware components (e.g., GPU models, CPU types, memory specifications) used for running the experiments. It only refers to general training processes.
Software Dependencies No The paper does not list any specific software dependencies or their version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow, or other libraries).
Experiment Setup Yes Using magnitude pruning and random growth (initial renewal rate of 50%), we sample model masks and weights every 1500 iterations over 250 epochs. Notably, all our sparse subnetworks maintain a 99% unstructured sparsity (only 1% trained weights remain non-zero). In all our experiments, we adopt the default n = 1,000, i = 20, and minibatch size 128.