Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets. |
| Researcher Affiliation | Academia | Junghyun Lee Hanseul Cho Se-Young Yun Chulhee Yun Kim Jaechul Graduate School of AI, KAIST EMAIL |
| Pseudocode | Yes | The pseudocode of our algorithm is shown in Algorithms 1 and 2. |
| Open Source Code | Yes | The code for all experiments is available at github.com/Hanseul Jo/fair-streaming-pca. |
| Open Datasets | Yes | We evaluate the efficacy of our proposed FNPM on the Celeb A dataset (Liu et al., 2015b). For the sake of completeness, we conduct a quantitative evaluation of our algorithm on UCI datasets (Adult Income, COMPAS, German Credit). |
| Dataset Splits | Yes | We adopt the predefined train-validation split and run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Then, using the output V of FNPM, we project images selected from the validation set. |
| Hardware Specification | Yes | All experiments were performed on Apple 2020 Mac mini M1 with 16GB RAM. |
| Software Dependencies | No | We implement our FNPM using Python JAX Num Py Module (Bradbury et al., 2023; Harris et al., 2020) and Pytorch (Paszke et al., 2017). This lists software names but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | For each channel of colors, we project the data onto a k = 1000-dimensional subspace while nullifying m = 2 leading eigenvectors of covariance difference. run our algorithm only on the training set for 5 iterations with block sizes of b = B = 32, 000. Ours (offline, m=15). |