Convex Formulations for Fair Principal Component Analysis

Authors: Matt Olfat, Anil Aswani663-670

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates. ...we demonstrate their effectiveness using several datasets.
Researcher Affiliation Academia Matt Olfat,1 Anil Aswani1 1UC Berkeley Berkeley, CA 94720 molfat@berkeley.edu, aaswani@berkeley.edu
Pseudocode No The paper describes mathematical formulations and steps, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper mentions supplementary material at 'https://arxiv.org/pdf/1802.03765.pdf', which is a link to an arXiv paper, not source code. There is no other explicit statement about the release of the authors' source code.
Open Datasets Yes We use synthetic and real datasets from the UC Irvine Machine Learning Repository (Lichman 2013)... We use minute-level data from the the National Health and Nutrition Examination Survey (NHANES) from 2005 2006 (Centers for Desease Control and Prevention (CDC). National Center for Health Statistics (NCHS). 2018)
Dataset Splits Yes For any SVM run, tuning parameters were chosen using 5-fold cross-validation... After splitting each dataset into separate training (70%) and testing (30%) sets
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions using SVMs and k-means clustering but does not specify the software libraries or their version numbers (e.g., scikit-learn version, PyTorch version).
Experiment Setup Yes For any SVM run, tuning parameters were chosen using 5-fold cross-validation, and data was normalized to have unit variance in each field. ...After splitting each dataset into separate training (70%) and testing (30%) sets... with δ = 0 and µ = 0.01... We conduct k-means clustering (with k = 3) on the dimensionality-reduced data.