Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis
Authors: Jiayu Su, David A Knowles, Raúl Rabadán
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate sis PCA s connections with autoencoders and regularized linear regression and showcase its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis from image features, learning aging-associated DNA methylation changes, and single-cell analysis of malaria infection. |
| Researcher Affiliation | Academia | Jiayu Su1,2,5 David A. Knowles2,4,5 Raul Rabadan1,2,3 1Program for Mathematical Genomics; 2Department of Systems Biology, Columbia University 3Department of Biomedical Informatics, Columbia University 4Department of Computer Science, Columbia University 5New York Genome Center |
| Pseudocode | Yes | Algorithm 1 Solving sis PCA-linear using alternating eigendecomposition |
| Open Source Code | Yes | 1A Python implementation of sis PCA is available on Git Hub at https://github.com/Jiayu Su PKU/sispca (DOI 10.5281/zenodo.13932660). The repository also includes notebooks to reproduce results in this paper. |
| Open Datasets | Yes | 2uciml/breast-cancer-wisconsin-data, CC BY-NC-SA 4.0 license. |
| Dataset Splits | No | The paper describes data preprocessing steps, such as using the top 2,000 highly variable genes for scRNA-seq data and downsampling the TCGA dataset, but it does not specify explicit training, validation, and test dataset splits. |
| Hardware Specification | Yes | We ran all provided notebooks (https://github.com/Jiayu Su PKU/sispca/tree/main/docs/source/tutorials) using a personal M1 Macbook Air with 16GB RAM and completed most analysis steps in minutes including model training. |
| Software Dependencies | Yes | scvi-tools v1.2.0 (https://scvi-tools.org/) |
| Experiment Setup | Yes | Autoencoders for latent mean and variance: One hidden layer with 128 hidden units, Re LU activation, batch normalization, and dropout. Predictor design (adapted from the sc VIGen QCModel in the HCV paper): One hidden layer with 25 neurons, Re LU activation, and dropout. |