reproducibilityindex.ai

Feature-Wise Bias Amplification

Authors: Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on synthetic and real data demonstrate that these algorithms consistently lead to reduced bias without harming accuracy, in some cases eliminating predictive bias altogether while providing modest gains in accuracy.
Researcher Affiliation	Academia	Klas Leino, Matt Fredrikson, Emily Black, Shayak Sen, & Anupam Datta Carnegie Mellon University
Pseudocode	No	The paper describes algorithms (Feature parity, Experts) through textual explanation and mathematical equations (Equation 6, Equation 7) but does not provide structured pseudocode blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	For example, for a VGG16 network trained on Celeb A (Liu et al., 2015) to predict the attractive label, our approach removed 95% of the bias in predictions. and We created a binary classiﬁcation problem from CIFAR10 (Krizhevsky & Hinton, 2009) from the bird and frog classes.
Dataset Splits	No	Logistic regression measurements were obtained by averaging over 20 pseudorandom training runs on a randomly-selected stratiﬁed train/test split. Experiments on deep networks use the training/test split provided by the respective dataset authors. (Explanation: The paper mentions train/test splits, but does not explicitly detail a separate validation set split or how it was used for model tuning beyond implicitly via “training runs” or existing dataset splits.)
Hardware Specification	No	The paper discusses software used for training (e.g., Keras 2 with Theano backend, scikit-learn's SGDClassifier estimator) but does not provide any specific details about the hardware specifications (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	For the logistic regression experiments, we used scikit-learn s SGDClassiﬁer estimator to train each model using the logistic loss function. Logistic regression measurements were obtained by averaging over 20 pseudorandom training runs on a randomly-selected stratiﬁed train/test split. Experiments involving experts selected α, β using grid search over the possible values that minimize bias subject to not harming accuracy as described in Section 4. Similarly, experiments involving ℓ1 regularization use a grid search to select the regularization paramter, optimizing for the same criteria used to select α, β. Experiments on deep networks use the training/test split provided by the respective dataset authors. Models were trained until convergence using Keras 2 with the Theano backend. (Explanation: The paper mentions "scikit-learn" and "Keras 2 with the Theano backend", but does not provide version numbers for all key software dependencies (e.g., scikit-learn) or the Theano backend, which is required for full reproducibility.)
Experiment Setup	Yes	Experiments involving experts selected α, β using grid search over the possible values that minimize bias subject to not harming accuracy as described in Section 4. Similarly, experiments involving ℓ1 regularization use a grid search to select the regularization paramter, optimizing for the same criteria used to select α, β.