Pruning has a disparate impact on model accuracy

Authors: Cuong Tran, Ferdinando Fioretto, Jung-Eun Kim, Rakshit Naidu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theoretical findings suggest the presence of two key factors responsible for why accuracy disparities arise in pruned models: (1) disparity in gradient norms across groups, and (2) disparity in Hessian matrices associated with the loss function computed using a group s data. Informally, the former carries information about the groups local optimality, while the latter relates to model separability. We analyze these factors in detail, providing both theoretical and empirical support on a variety of settings, networks, and datasets.
Researcher Affiliation Academia Cuong Tran Department of Computer Science Syracuse University cutran@syr.edu Ferdinando Fioretto Department of Computer Science Syracuse University ffiorett@syr.edu Jung-Eun Kim Department of Computer Science North Carolina State University jung-eun.kim@ncsu.edu Rakshit Naidu Department of Computer Science Carnegie Mellon University rnemakal@andrew.cmu.edu
Pseudocode No The paper describes methods in prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes These results use the UTKFace dataset [41] for a vision task whose goal is to classify ethnicity. The experiments use a Res Net-18 architecture and the pruning counterparts remove the P% parameters with the smallest absolute values for various P. All reported metrics are normalized and an average of 10 repetitions.
Dataset Splits No The paper mentions using datasets like UTKFace, CIFAR-10, and SVHN, and training models, but it does not explicitly state the training, validation, and test dataset splits (e.g., percentages or sample counts for each split) within the main body or referenced appendices.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions neural network architectures (e.g., Res Net-18, Res Net50, VGG19) and optimizers (SGD) but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The models are trained using SGD optimizer with momentum 0.9 and initial learning rate 0.01 with cosine annealing scheduler for 100 epochs.