Balancing Act: Constraining Disparate Impact in Sparse Models
Authors: Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups. |
| Researcher Affiliation | Academia | 1Mila 2DIRO Universit e de Montr eal 3Mc Gill University 4Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Constrained Excess Accuracy Gap (CEAG) |
| Open Source Code | Yes | Our code is available here: https://github.com/merajhashemi/balancing-act |
| Open Datasets | Yes | We carry out experiments on the Fair Face (K arkk ainen & Joo, 2021) and UTKFace (Zhang et al., 2017) datasets, following the works of Lin et al. (2022) and Tran et al. (2022). Additionally, we perform experiments on CIFAR-100 (Krizhevsky, 2009) |
| Dataset Splits | Yes | The choice of buffer size k introduces a trade-off between reducing the variance of the constraints, and biasing estimates towards old measurements. ... We fine-tune sparse models on UTKFace and CIFAR for 45 epochs, and for 32 epochs on Fair Face. ... For UTKFace and CIFAR-100, we set scheduler milestones at 60%, 80% and 90% of the total training epochs (including the execution of GMP). ... NFT+ES: the best iterate of NFT in terms of test accuracy (early stopping) |
| Hardware Specification | Yes | Table 14: Runtime of different mitigation approaches on CIFAR-100 at 95% sparsity. All runs are run on NVIDIA A100-SXM4-80GB GPUs. Runtimes are average across 5 runs for each mitigation method. |
| Software Dependencies | Yes | Our implementations use Py Torch 1.13.0 (Paszke et al., 2019) and the Cooper library for constrained optimization (Gallego-Posada & Ramirez, 2022). |
| Experiment Setup | Yes | For UTKFace and CIFAR-100 datasets, we employ a primal step size of 1 10 2 along with a momentum of 0.9 (Polyak), and apply weight decay at the rate of 1 10 4. ... For Fair Face, we employ Nesterov momentum with a step-size of 1 10 3 and apply a weight decay of 1 10 2. ... We highlight the data transformations and the batch size we employ for each dataset in Table 6. |