Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Balancing Act: Constraining Disparate Impact in Sparse Models
Authors: Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups. |
| Researcher Affiliation | Academia | 1Mila 2DIRO Universit e de Montr eal 3Mc Gill University 4Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Constrained Excess Accuracy Gap (CEAG) |
| Open Source Code | Yes | Our code is available here: https://github.com/merajhashemi/balancing-act |
| Open Datasets | Yes | We carry out experiments on the Fair Face (K arkk ainen & Joo, 2021) and UTKFace (Zhang et al., 2017) datasets, following the works of Lin et al. (2022) and Tran et al. (2022). Additionally, we perform experiments on CIFAR-100 (Krizhevsky, 2009) |
| Dataset Splits | Yes | The choice of buffer size k introduces a trade-off between reducing the variance of the constraints, and biasing estimates towards old measurements. ... We fine-tune sparse models on UTKFace and CIFAR for 45 epochs, and for 32 epochs on Fair Face. ... For UTKFace and CIFAR-100, we set scheduler milestones at 60%, 80% and 90% of the total training epochs (including the execution of GMP). ... NFT+ES: the best iterate of NFT in terms of test accuracy (early stopping) |
| Hardware Specification | Yes | Table 14: Runtime of different mitigation approaches on CIFAR-100 at 95% sparsity. All runs are run on NVIDIA A100-SXM4-80GB GPUs. Runtimes are average across 5 runs for each mitigation method. |
| Software Dependencies | Yes | Our implementations use Py Torch 1.13.0 (Paszke et al., 2019) and the Cooper library for constrained optimization (Gallego-Posada & Ramirez, 2022). |
| Experiment Setup | Yes | For UTKFace and CIFAR-100 datasets, we employ a primal step size of 1 10 2 along with a momentum of 0.9 (Polyak), and apply weight decay at the rate of 1 10 4. ... For Fair Face, we employ Nesterov momentum with a step-size of 1 10 3 and apply a weight decay of 1 10 2. ... We highlight the data transformations and the batch size we employ for each dataset in Table 6. |