Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets. |
| Researcher Affiliation | Academia | Junghyun Lee Hanseul Cho Se-Young Yun Chulhee Yun Kim Jaechul Graduate School of AI, KAIST {jh_lee00, jhs4015, yunseyoung, chulhee.yun}@kaist.ac.kr |
| Pseudocode | Yes | The pseudocode of our algorithm is shown in Algorithms 1 and 2. |
| Open Source Code | Yes | The code for all experiments is available at github.com/Hanseul Jo/fair-streaming-pca. |
| Open Datasets | Yes | We evaluate the efficacy of our proposed FNPM on the Celeb A dataset (Liu et al., 2015b). For the sake of completeness, we conduct a quantitative evaluation of our algorithm on UCI datasets (Adult Income, COMPAS, German Credit). |
| Dataset Splits | Yes | We adopt the predefined train-validation split and run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Then, using the output V of FNPM, we project images selected from the validation set. |
| Hardware Specification | Yes | All experiments were performed on Apple 2020 Mac mini M1 with 16GB RAM. |
| Software Dependencies | No | We implement our FNPM using Python JAX Num Py Module (Bradbury et al., 2023; Harris et al., 2020) and Pytorch (Paszke et al., 2017). This lists software names but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | For each channel of colors, we project the data onto a k = 1000-dimensional subspace while nullifying m = 2 leading eigenvectors of covariance difference. run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Ours (offline, m=15). |