Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Intersectional Unfairness Discovery
Authors: Gezheng Xu, Qi Chen, Charles Ling, Boyu Wang, Changjian Shui
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Western Ontario 2University of Toronto 3Vector Institute. Correspondence to: Boyu Wang <EMAIL>, Changjian Shui <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Bias Guided Generative Network (BGGN) |
| Open Source Code | Yes | The Code is available at: https://github.com/ xugezheng/BGGN. |
| Open Datasets | Yes | Celeb A (Image) (Liu et al., 2015) A face image dataset containing 200K images. ... Toxic (Text) (Borkan et al., 2019). The main task of this dataset is to predict the toxicity of text comments |
| Dataset Splits | Yes | We split the data into Observation (or training) and Holdout datasets, where there is no intersectional sensitive attribute overlap between these two sub-datasets. ... After obtaining this enriched dataset Dbias with bias value, we randomly split it into an Observation set (70%) and a Holdout set (30%) to train the bias value predictor \ Lf(a) and the generator, with NO sensitive attributes overlapping. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided. |
| Software Dependencies | No | The paper mentions 'Distil BERT' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We train f(x) for 3 epochs with a batch size of 64. We utilize the Adam optimizer and fix the learning rate at 1e-4 for both the backbone model and classifier. ... we train the predictor for 60 epoches using MSE loss and Adam optimizer with a learning rate of 1e-3. ... We first (pre-)train the vanilla generative model for 5 epoches with Adam optimizer and set the learning rate at 1e-3. ... We conducted 500 sampling iterations, with a batch size of 128 for each sampling. ... set a relatively small learning rate, with 2e-5 for the encoder and 1e-5 for the decoder. ... We set the resample number as 10, and the filter proportion as 0.2 on celeb A dataset. |