The Price of Fair PCA: One Extra dimension
Authors: Samira Samadi, Uthaipon Tantipongpipat, Jamie H. Morgenstern, Mohit Singh, Santosh Vempala
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show on several real-world data sets, PCA has higher reconstruction error on population A than on B...Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data. ...We then evaluate the empirical performance of this algorithm on several human-centric data sets. |
| Researcher Affiliation | Academia | Samira Samadi Georgia Tech ssamadi6@gatech.edu Uthaipon Tantipongpipat Georgia Tech tao@gatech.edu Jamie Morgenstern Georgia Tech jamiemmt.cs@gatech.edu Mohit Singh Georgia Tech mohitsinghr@gmail.com Santosh Vempala Georgia Tech vempala@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1: Fair PCA |
| Open Source Code | No | The paper states, 'The details of the algorithm are given in the full version of this work,' but does not provide an explicit statement about open-sourcing the code or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | We use two common human-centric data sets for our experiments. The first one is labeled faces in the wild (LFW) [Huang et al., 2007], the second is the Default Credit data set [Yeh and Lien, 2009]. |
| Dataset Splits | No | The paper mentions subsampling the LFW dataset for equal probability, 'sampling 1000 faces with men and women equiprobably,' but does not provide specific details on training, validation, or test splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper mentions runtime performance ('Our MW can handle data of dimension up to a thousand with running time in less than a minute.') but does not specify any hardware details such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper discusses solving SDP and LP problems, and using a multiplicative weight (MW) update method. However, it does not provide specific version numbers for any software, libraries, or solvers used. |
| Experiment Setup | No | The paper describes data preprocessing steps such as mean centering and normalization ('We preprocess all data to have its mean at the origin. For the LFW data, we normalized each pixel value by 1/255. ... For the credit data, we normalized the variance of each attribute to be equal to 1.'). However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings for the Fair PCA algorithm itself, or the 'appropriately tuning one parameter in MW' (without specifying its value). |