Propensity Score Alignment of Unpaired Multimodal Data

Authors: Johnny Xi, Jana Osea, Zuheng Xu, Jason S. Hartford

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a comprehensive evaluation of our proposed methodology on three distinct datasets: (1) synthetic paired images, (2) single-cell CITE-seq dataset (simultaneous measurement of single-cell RNA-seq and surface protein measurements) [Stoeckius et al., 2017], and (3) Perturb-seq and singlecell image data.
Researcher Affiliation Collaboration Johnny Xi Department of Statistics University of British Columbia Vancouver, Canada johnny.xi@stat.ubc.ca Jana Osea Valence Labs Montreal, Canada jana@valencelabs.com Zuheng (David) Xu Department of Statistics University of British Columbia Vancouver, Canada zuheng.xu@stat.ubc.ca Jason Hartford Valence Labs London, UK jason@valencelabs.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Full code, complete with scripts to obtain synthetic or publicly available data for experimental datasets (1) and (2), is provided as supplementary material.
Open Datasets Yes We used the CITE-seq dataset from the Neur IPS 2021 Multimodal single-cell data integration competition [Lance et al., 2022], consisting of paired RNA-seq and surface level protein measurements over 45 cell types. ... (obtained from GEO accession GSE194122).
Dataset Splits Yes All models are saved at the optimal validation loss to perform subsequent matching." and "In the first two cases, there is a ground-truth matching that we use for evaluation, but samples are randomly permuted during training." and "We evaluated the predictive models against ground truth pairs by computing the prediction R2 (higher is better) on a held-out, unpermuted, test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications. It only describes the neural network architectures used.
Software Dependencies Yes All models for the experiments are implemented using Torch v2.2.2 [Paszke et al., 2017] and Pytorch Lightning v2.2.4 [Falcon and Py Torch Lightning Team, 2023]. ... Shared nearest neighbours (SNN) is implemented using scikit-learn v1.4.0 [Pedregosa et al., 2011] using a single neighbour, and OT is implemented using the Sinkhorn algorithm as implemented in the pot v0.9.3 package [Flamary et al., 2021].
Experiment Setup Yes We use the Adam optimizer with learning rate 0.0001 and one cycle learning rate scheduler. We follow Yang et al. [2021] and set α = 1, β = 0.1, but found that λ = 10 9 (compared to λ = 10 7 in Yang et al. [2021]) resulted in better performance. We used batch size 256 in both instances and trained for either 100 epochs (image) or 250 epochs (CITE-seq).