Propensity Score Alignment of Unpaired Multimodal Data
Authors: Johnny Xi, Jana Osea, Zuheng Xu, Jason S. Hartford
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a comprehensive evaluation of our proposed methodology on three distinct datasets: (1) synthetic paired images, (2) single-cell CITE-seq dataset (simultaneous measurement of single-cell RNA-seq and surface protein measurements) [Stoeckius et al., 2017], and (3) Perturb-seq and singlecell image data. |
| Researcher Affiliation | Collaboration | Johnny Xi Department of Statistics University of British Columbia Vancouver, Canada johnny.xi@stat.ubc.ca Jana Osea Valence Labs Montreal, Canada jana@valencelabs.com Zuheng (David) Xu Department of Statistics University of British Columbia Vancouver, Canada zuheng.xu@stat.ubc.ca Jason Hartford Valence Labs London, UK jason@valencelabs.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Full code, complete with scripts to obtain synthetic or publicly available data for experimental datasets (1) and (2), is provided as supplementary material. |
| Open Datasets | Yes | We used the CITE-seq dataset from the Neur IPS 2021 Multimodal single-cell data integration competition [Lance et al., 2022], consisting of paired RNA-seq and surface level protein measurements over 45 cell types. ... (obtained from GEO accession GSE194122). |
| Dataset Splits | Yes | All models are saved at the optimal validation loss to perform subsequent matching." and "In the first two cases, there is a ground-truth matching that we use for evaluation, but samples are randomly permuted during training." and "We evaluated the predictive models against ground truth pairs by computing the prediction R2 (higher is better) on a held-out, unpermuted, test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications. It only describes the neural network architectures used. |
| Software Dependencies | Yes | All models for the experiments are implemented using Torch v2.2.2 [Paszke et al., 2017] and Pytorch Lightning v2.2.4 [Falcon and Py Torch Lightning Team, 2023]. ... Shared nearest neighbours (SNN) is implemented using scikit-learn v1.4.0 [Pedregosa et al., 2011] using a single neighbour, and OT is implemented using the Sinkhorn algorithm as implemented in the pot v0.9.3 package [Flamary et al., 2021]. |
| Experiment Setup | Yes | We use the Adam optimizer with learning rate 0.0001 and one cycle learning rate scheduler. We follow Yang et al. [2021] and set α = 1, β = 0.1, but found that λ = 10 9 (compared to λ = 10 7 in Yang et al. [2021]) resulted in better performance. We used batch size 256 in both instances and trained for either 100 epochs (image) or 250 epochs (CITE-seq). |