Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Convex Formulations for Fair Principal Component Analysis
Authors: Matt Olfat, Anil Aswani663-670
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates. ...we demonstrate their effectiveness using several datasets. |
| Researcher Affiliation | Academia | Matt Olfat,1 Anil Aswani1 1UC Berkeley Berkeley, CA 94720 EMAIL, EMAIL |
| Pseudocode | No | The paper describes mathematical formulations and steps, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions supplementary material at 'https://arxiv.org/pdf/1802.03765.pdf', which is a link to an arXiv paper, not source code. There is no other explicit statement about the release of the authors' source code. |
| Open Datasets | Yes | We use synthetic and real datasets from the UC Irvine Machine Learning Repository (Lichman 2013)... We use minute-level data from the the National Health and Nutrition Examination Survey (NHANES) from 2005 2006 (Centers for Desease Control and Prevention (CDC). National Center for Health Statistics (NCHS). 2018) |
| Dataset Splits | Yes | For any SVM run, tuning parameters were chosen using 5-fold cross-validation... After splitting each dataset into separate training (70%) and testing (30%) sets |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions using SVMs and k-means clustering but does not specify the software libraries or their version numbers (e.g., scikit-learn version, PyTorch version). |
| Experiment Setup | Yes | For any SVM run, tuning parameters were chosen using 5-fold cross-validation, and data was normalized to have unit variance in each ๏ฌeld. ...After splitting each dataset into separate training (70%) and testing (30%) sets... with ฮด = 0 and ยต = 0.01... We conduct k-means clustering (with k = 3) on the dimensionality-reduced data. |