reproducibilityindex.ai

Mind the Graph When Balancing Data for Fairness or Robustness

Authors: Jessica Schrouff, Alexis Bellot, Amal Rannen-Triki, Alan Malek, Isabela Albuquerque, Arthur Gretton, Alexander D'Amour, Silvia Chiappa

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contributions can be summarized as follows: (1) we display failure modes of data balancing in semi-synthetic tasks and highlight how predicting these failures can be challenging; (2) we introduce conditions for data balancing to attain invariance to undesired dependencies as deﬁned by fairness or robustness criteria; (3) we prove that data balancing does not correspond to removing undesired dependencies from a causal perspective, and can negatively impact fairness or robustness criteria when combined with regularization strategies; and (4) we illustrate how our ﬁndings can be used to distinguish between failure modes and identify next steps.
Researcher Affiliation	Collaboration	Jessica Schrouff Google DeepMind schrouff@google.com Alexis Bellot Google DeepMind Amal Rannen-Triki Google DeepMind Alan Malek Google DeepMind Isabela Albuquerque Google DeepMind Arthur Gretton Google DeepMind Gatsby, UCL Alexander D Amour Google DeepMind Silvia Chiappa Google DeepMind
Pseudocode	No	The paper includes mathematical derivations and a Python code snippet embedded as a figure (Figure 7) that describes a specific simulation, but it does not contain a formal 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	Our work uses open access datasets to illustrate a baseline method for mitigating undesired dependencies. There is no speciﬁc code contribution in our experiments.
Open Datasets	Yes	This work uses the MNIST [45, 17, http://yann.lecun.com/exdb/mnist/], Amazon reviews [53], Image Net [16, https://image-net.org/] and Celeb A [46, http://mmlab.ie.cuhk.edu.hk/projects/Celeb A. html] datasets, which are all openly accessible and can be used for research purposes.
Dataset Splits	Yes	We sample n = 30, 000 samples from P t, as well as a dataset jointly balanced on Y and Z (Q, n = 30, 000). We also sample test data based on a ground truth P 0 generated with P 0(Z = 0\|Y ) = 0.5 (n = 2, 000).
Hardware Specification	Yes	For the small convolutional networks, training was performed with 4 GPUs (V100) and evaluation used 1 GPU per model instance. BERT used 2 Tensor Processing Units (TPUs) for training and 1 TPU for evaluation. For all other models, we used 4 Tensor Processing Units for training and 1 TPU or GPU (P100) for evaluation.
Software Dependencies	No	The paper states 'All other models are trained from scratch in our code infrastructure written in Python and JAX [7].' and 'The results are then analyzed with Python and the numpy [30], matplotlib [32, https://matplotlib.org/] and pandas [50, https://pandas.pydata.org/] packages.' While programming languages and libraries are named, specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	We include a hyper-parameter search over the learning rate (5 values in log-scale between 9e 5 and 0.1) coupled with a batch size search between sizes of 128, 256 and 512 examples. In terms of regularization, the small convolutional network include dropout in the dense layers (search on 0.1, 0.2, 0.3)... We impose an L2-regularization of 1e 4 during training for all architectures.