Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.
Researcher Affiliation Academia Junghyun Lee Hanseul Cho Se-Young Yun Chulhee Yun Kim Jaechul Graduate School of AI, KAIST {jh_lee00, jhs4015, yunseyoung, chulhee.yun}@kaist.ac.kr
Pseudocode Yes The pseudocode of our algorithm is shown in Algorithms 1 and 2.
Open Source Code Yes The code for all experiments is available at github.com/Hanseul Jo/fair-streaming-pca.
Open Datasets Yes We evaluate the efficacy of our proposed FNPM on the Celeb A dataset (Liu et al., 2015b). For the sake of completeness, we conduct a quantitative evaluation of our algorithm on UCI datasets (Adult Income, COMPAS, German Credit).
Dataset Splits Yes We adopt the predefined train-validation split and run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Then, using the output V of FNPM, we project images selected from the validation set.
Hardware Specification Yes All experiments were performed on Apple 2020 Mac mini M1 with 16GB RAM.
Software Dependencies No We implement our FNPM using Python JAX Num Py Module (Bradbury et al., 2023; Harris et al., 2020) and Pytorch (Paszke et al., 2017). This lists software names but does not provide specific version numbers for reproducibility.
Experiment Setup Yes For each channel of colors, we project the data onto a k = 1000-dimensional subspace while nullifying m = 2 leading eigenvectors of covariance difference. run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Ours (offline, m=15).