reproducibilityindex.ai

Convex Formulations for Fair Principal Component Analysis

Authors: Matt Olfat, Anil Aswani663-670

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates. ...we demonstrate their effectiveness using several datasets.
Researcher Affiliation	Academia	Matt Olfat,1 Anil Aswani1 1UC Berkeley Berkeley, CA 94720 molfat@berkeley.edu, aaswani@berkeley.edu
Pseudocode	No	The paper describes mathematical formulations and steps, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper mentions supplementary material at 'https://arxiv.org/pdf/1802.03765.pdf', which is a link to an arXiv paper, not source code. There is no other explicit statement about the release of the authors' source code.
Open Datasets	Yes	We use synthetic and real datasets from the UC Irvine Machine Learning Repository (Lichman 2013)... We use minute-level data from the the National Health and Nutrition Examination Survey (NHANES) from 2005 2006 (Centers for Desease Control and Prevention (CDC). National Center for Health Statistics (NCHS). 2018)
Dataset Splits	Yes	For any SVM run, tuning parameters were chosen using 5-fold cross-validation... After splitting each dataset into separate training (70%) and testing (30%) sets
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using SVMs and k-means clustering but does not specify the software libraries or their version numbers (e.g., scikit-learn version, PyTorch version).
Experiment Setup	Yes	For any SVM run, tuning parameters were chosen using 5-fold cross-validation, and data was normalized to have unit variance in each ﬁeld. ...After splitting each dataset into separate training (70%) and testing (30%) sets... with δ = 0 and µ = 0.01... We conduct k-means clustering (with k = 3) on the dimensionality-reduced data.