reproducibilityindex.ai

The Variational Fair Autoencoder

Authors: Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We discuss how these architectures can be efﬁciently trained on data and show in experiments that this method is more effective than previous work in removing unwanted sources of variation while maintaining informative latent representations.
Researcher Affiliation	Academia	Machine Learning Group, University of Amsterdam Department of Computer Science, University of Toronto Canadian Institute for Advanced Research (CIFAR) University of California, Irvine
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	For the fairness task we experimented with three datasets that were previously used by Zemel et al. (2013). The German dataset is the smallest one with 1000 data points and the objective is to predict whether a person has a good or bad credit rating. The sensitive variable is the gender of the individual. The Adult income dataset contains 45, 222 entries and describes whether an account holder has over $50, 000 dollars in their account. The sensitive variable is age. Both of these are obtained from the UCI machine learning repository (Frank & Asuncion, 2010). The health dataset is derived from the Heritage Health Prize2. It is the largest of the three datasets with 147, 473 entries. The task is to predict whether a patient will spend any days in the hospital in the next year and the sensitive variable is the age of the individual. For the domain adaptation task we used the Amazon reviews dataset (with similar preprocessing) that was also employed by Chen et al. (2012) and Ganin et al. (2015). For the general task of learning invariant representations we used the Extended Yale B dataset, which was also employed in a similar fashion by Li et al. (2014).
Dataset Splits	Yes	We use the same train/test/validation splits as Zemel et al. (2013) for our experiments.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2015)' for optimization but does not provide specific version numbers for any software libraries, frameworks, or environments used.
Experiment Setup	Yes	For the Adult dataset both encoders, for z1 and z2, and both decoders, for z1 and x, had one hidden layer of 100 units. For the Health dataset we had one hidden layer of 300 units for the z1 encoder and x decoder and one hidden layer of 150 units for the z2 encoder and z1 decoder. For the much smaller German dataset we used 60 hidden units for both encoders and decoders. Finally, for the Amazon reviews and Extended Yale B datasets we had one hidden layer with 500, 400 units respectively for the z1 encoder, x decoder, and 300, 100 units respectively for the z2 encoder and z1 decoder. On all of the datasets we used 50 latent dimensions for z1 and z2, except for the small German dataset, where we used 30 latent dimensions for both variables. For the predictive posterior qφ(y\|z1) we used a simple Logistic regression classiﬁer. Optimization of the objective function was done with Adam (Kingma & Ba, 2015) using the default values for the hyperparameters, minibatches of 100 data points and temporal averaging. The MMD penalty was simply multiplied by the minibatch size so as to keep the scale of the penalty similar to the lower bound. Furthermore, the extra strength of the MMD, β, was tuned according to a validation set. The scaling of the supervised cost was low (α = 1) for the Adult, Health and German datasets due to the correlation of s with y. On the Amazon reviews and Extended Yale B datasets however the scaling of the supervised cost was higher: α = 100 Nbatch source+Nbatch target / Nbatch source for the Amazon reviews dataset (empirically determined after observing the classiﬁcation loss on the ﬁrst few iterations on the ﬁrst source-target pair) and α = 200 for the Extended Yale B dataset. Similarly, the scaling of the MMD penalty was β = 100 Nbatch for the Amazon reviews dataset and β = 200 Nbatch for the Extended Yale B.