Balancing Act: Constraining Disparate Impact in Sparse Models

Authors: Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
Researcher Affiliation Academia 1Mila 2DIRO Universit e de Montr eal 3Mc Gill University 4Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Constrained Excess Accuracy Gap (CEAG)
Open Source Code Yes Our code is available here: https://github.com/merajhashemi/balancing-act
Open Datasets Yes We carry out experiments on the Fair Face (K arkk ainen & Joo, 2021) and UTKFace (Zhang et al., 2017) datasets, following the works of Lin et al. (2022) and Tran et al. (2022). Additionally, we perform experiments on CIFAR-100 (Krizhevsky, 2009)
Dataset Splits Yes The choice of buffer size k introduces a trade-off between reducing the variance of the constraints, and biasing estimates towards old measurements. ... We fine-tune sparse models on UTKFace and CIFAR for 45 epochs, and for 32 epochs on Fair Face. ... For UTKFace and CIFAR-100, we set scheduler milestones at 60%, 80% and 90% of the total training epochs (including the execution of GMP). ... NFT+ES: the best iterate of NFT in terms of test accuracy (early stopping)
Hardware Specification Yes Table 14: Runtime of different mitigation approaches on CIFAR-100 at 95% sparsity. All runs are run on NVIDIA A100-SXM4-80GB GPUs. Runtimes are average across 5 runs for each mitigation method.
Software Dependencies Yes Our implementations use Py Torch 1.13.0 (Paszke et al., 2019) and the Cooper library for constrained optimization (Gallego-Posada & Ramirez, 2022).
Experiment Setup Yes For UTKFace and CIFAR-100 datasets, we employ a primal step size of 1 10 2 along with a momentum of 0.9 (Polyak), and apply weight decay at the rate of 1 10 4. ... For Fair Face, we employ Nesterov momentum with a step-size of 1 10 3 and apply a weight decay of 1 10 2. ... We highlight the data transformations and the batch size we employ for each dataset in Table 6.