The Price of Fair PCA: One Extra dimension

Authors: Samira Samadi, Uthaipon Tantipongpipat, Jamie H. Morgenstern, Mohit Singh, Santosh Vempala

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on several real-world data sets, PCA has higher reconstruction error on population A than on B...Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data. ...We then evaluate the empirical performance of this algorithm on several human-centric data sets.
Researcher Affiliation Academia Samira Samadi Georgia Tech ssamadi6@gatech.edu Uthaipon Tantipongpipat Georgia Tech tao@gatech.edu Jamie Morgenstern Georgia Tech jamiemmt.cs@gatech.edu Mohit Singh Georgia Tech mohitsinghr@gmail.com Santosh Vempala Georgia Tech vempala@cc.gatech.edu
Pseudocode Yes Algorithm 1: Fair PCA
Open Source Code No The paper states, 'The details of the algorithm are given in the full version of this work,' but does not provide an explicit statement about open-sourcing the code or a direct link to a code repository for the described methodology.
Open Datasets Yes We use two common human-centric data sets for our experiments. The first one is labeled faces in the wild (LFW) [Huang et al., 2007], the second is the Default Credit data set [Yeh and Lien, 2009].
Dataset Splits No The paper mentions subsampling the LFW dataset for equal probability, 'sampling 1000 faces with men and women equiprobably,' but does not provide specific details on training, validation, or test splits (e.g., percentages or counts).
Hardware Specification No The paper mentions runtime performance ('Our MW can handle data of dimension up to a thousand with running time in less than a minute.') but does not specify any hardware details such as CPU/GPU models or memory.
Software Dependencies No The paper discusses solving SDP and LP problems, and using a multiplicative weight (MW) update method. However, it does not provide specific version numbers for any software, libraries, or solvers used.
Experiment Setup No The paper describes data preprocessing steps such as mean centering and normalization ('We preprocess all data to have its mean at the origin. For the LFW data, we normalized each pixel value by 1/255. ... For the credit data, we normalized the variance of each attribute to be equal to 1.'). However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings for the Fair PCA algorithm itself, or the 'appropriately tuning one parameter in MW' (without specifying its value).