Don’t just prune by magnitude! Your mask topology is a secret weapon
Authors: Duc Hoang, Souvik Kundu, Shiwei Liu, Zhangyang "Atlas" Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To generate and evaluate results in this section, we utilize the ITOP framework [13] for its reliable performance in monitoring sparse structures and their evolving graph characteristics. Using magnitude pruning and random growth (initial renewal rate of 50%), we sample model masks and weights every 1500 iterations over 250 epochs. After training these unique sparse models, we assess their graph characteristics to base our subsequent observations. Notably, all our sparse subnetworks maintain a 99% unstructured sparsity (only 1% trained weights remain non-zero). |
| Researcher Affiliation | Collaboration | Duc Hoang University of Texas at Austin hoangduc@utexas.edu Souvik Kundu Intel Labs souvikk.kundu@intel.com Shiwei Liu Eindhoven University of Technology University of Texas at Austin s.liu3@tue.nl Zhangyang Wang Univerity of Texas at Austin atlaswang@utexas.edu |
| Pseudocode | Yes | Algorithm 1: Pruning at Initialization as Graph Sampling (PAGS) |
| Open Source Code | Yes | Codes can be found at: https://github.com/VITA-Group/Full Spectrum-PAI. |
| Open Datasets | Yes | We demonstrate results on CIFAR-10/ CIFAR-100 in the main text. We use two representative models, Resnet18 and Res Net34, as the main backbones in this section. Additional training details and results on Tiny-Imange Net are deferred to the Supplementary due to the space limit. The CIFAR-10 dataset consists of 50,000 training images and 10,000 testing images, divided into 10 different classes. The Tiny-Imagenet dataset consists of 100,000 images, divided into 200 different classes. |
| Dataset Splits | Yes | The CIFAR-10 dataset consists of 50,000 training images and 10,000 testing images, divided into 10 different classes. The Tiny-Imagenet dataset consists of 100,000 images, divided into 200 different classes. |
| Hardware Specification | No | The paper does not mention any specific hardware components (e.g., GPU models, CPU types, memory specifications) used for running the experiments. It only refers to general training processes. |
| Software Dependencies | No | The paper does not list any specific software dependencies or their version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow, or other libraries). |
| Experiment Setup | Yes | Using magnitude pruning and random growth (initial renewal rate of 50%), we sample model masks and weights every 1500 iterations over 250 epochs. Notably, all our sparse subnetworks maintain a 99% unstructured sparsity (only 1% trained weights remain non-zero). In all our experiments, we adopt the default n = 1,000, i = 20, and minibatch size 128. |