Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis

Authors: Jiayu Su, David A Knowles, Raúl Rabadán

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate sis PCA s connections with autoencoders and regularized linear regression and showcase its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis from image features, learning aging-associated DNA methylation changes, and single-cell analysis of malaria infection.
Researcher Affiliation Academia Jiayu Su1,2,5 David A. Knowles2,4,5 Raul Rabadan1,2,3 1Program for Mathematical Genomics; 2Department of Systems Biology, Columbia University 3Department of Biomedical Informatics, Columbia University 4Department of Computer Science, Columbia University 5New York Genome Center
Pseudocode Yes Algorithm 1 Solving sis PCA-linear using alternating eigendecomposition
Open Source Code Yes 1A Python implementation of sis PCA is available on Git Hub at https://github.com/Jiayu Su PKU/sispca (DOI 10.5281/zenodo.13932660). The repository also includes notebooks to reproduce results in this paper.
Open Datasets Yes 2uciml/breast-cancer-wisconsin-data, CC BY-NC-SA 4.0 license.
Dataset Splits No The paper describes data preprocessing steps, such as using the top 2,000 highly variable genes for scRNA-seq data and downsampling the TCGA dataset, but it does not specify explicit training, validation, and test dataset splits.
Hardware Specification Yes We ran all provided notebooks (https://github.com/Jiayu Su PKU/sispca/tree/main/docs/source/tutorials) using a personal M1 Macbook Air with 16GB RAM and completed most analysis steps in minutes including model training.
Software Dependencies Yes scvi-tools v1.2.0 (https://scvi-tools.org/)
Experiment Setup Yes Autoencoders for latent mean and variance: One hidden layer with 128 hidden units, Re LU activation, batch normalization, and dropout. Predictor design (adapted from the sc VIGen QCModel in the HCV paper): One hidden layer with 25 neurons, Re LU activation, and dropout.