Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Balancing Act: Constraining Disparate Impact in Sparse Models

Authors: Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
Researcher Affiliation Academia 1Mila 2DIRO Universit e de Montr eal 3Mc Gill University 4Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Constrained Excess Accuracy Gap (CEAG)
Open Source Code Yes Our code is available here: https://github.com/merajhashemi/balancing-act
Open Datasets Yes We carry out experiments on the Fair Face (K arkk ainen & Joo, 2021) and UTKFace (Zhang et al., 2017) datasets, following the works of Lin et al. (2022) and Tran et al. (2022). Additionally, we perform experiments on CIFAR-100 (Krizhevsky, 2009)
Dataset Splits Yes The choice of buffer size k introduces a trade-off between reducing the variance of the constraints, and biasing estimates towards old measurements. ... We fine-tune sparse models on UTKFace and CIFAR for 45 epochs, and for 32 epochs on Fair Face. ... For UTKFace and CIFAR-100, we set scheduler milestones at 60%, 80% and 90% of the total training epochs (including the execution of GMP). ... NFT+ES: the best iterate of NFT in terms of test accuracy (early stopping)
Hardware Specification Yes Table 14: Runtime of different mitigation approaches on CIFAR-100 at 95% sparsity. All runs are run on NVIDIA A100-SXM4-80GB GPUs. Runtimes are average across 5 runs for each mitigation method.
Software Dependencies Yes Our implementations use Py Torch 1.13.0 (Paszke et al., 2019) and the Cooper library for constrained optimization (Gallego-Posada & Ramirez, 2022).
Experiment Setup Yes For UTKFace and CIFAR-100 datasets, we employ a primal step size of 1 10 2 along with a momentum of 0.9 (Polyak), and apply weight decay at the rate of 1 10 4. ... For Fair Face, we employ Nesterov momentum with a step-size of 1 10 3 and apply a weight decay of 1 10 2. ... We highlight the data transformations and the batch size we employ for each dataset in Table 6.