Contrastive dimension reduction: when and how?

Authors: Sam Hawke, YueEn Ma, Didong Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group.
Researcher Affiliation Academia Sam Hawke Department of Biostatistics University of North Carolina at Chapel Hill shawke@unc.edu Yue En Ma Department of Statistics & Operations Research University of North Carolina at Chapel Hill myueen@unc.edu Didong Li Department of Biostatistics University of North Carolina at Chapel Hill didongli@unc.edu
Pseudocode Yes Algorithm 1: Contrastive Bootstrap Hypothesis Test and Algorithm 2: Contrastive Dimension Estimator
Open Source Code Yes All code available at https://github.com/myueen/contrastive-dimension-estimation
Open Datasets Yes Simulation 4. To assess our methods in a simulation setting where both dx and dy are unknown, we consider a very similar simulation to the previous one, but where we start with 5923 handwritten 0 s from the MNIST dataset (Deng, 2012) instead of randomly generated disks. Corrupted MNIST. The corrupted MNIST dataset was previously studied extensively by Abid et al. (2018)... Mouse Protein. This dataset (Higuera et al., 2015) is a benchmark in the literature... m Health. The m Health dataset (Banos et al., 2015), studied in Abid et al. (2018); Severson et al. (2019)... BMMC. This is a single-cell RNA sequencing dataset (Zheng et al., 2017) studied in Abid et al. (2018)... Small Molecule. This is a dataset of cell line responses to small-molecule therapy (Mc Farland et al., 2020)... ECCITE-Seq. This ECCITE-Seq dataset (Mimitou et al., 2019)... Pathogen Data. Studied by Weinberger et al. (2023), this dataset (Haber et al., 2017)... Perturb-Seq: Studied by Weinberger et al. (2023), this dataset (Adamson et al., 2016)... Celeb A: The Celeb A dataset (Liu et al., 2015)...
Dataset Splits No The paper discusses training and testing, but it does not provide specific details on validation dataset splits, percentages, or methodology for its experiments.
Hardware Specification Yes All experiments were run on a Linux-based virtual computer with 6500 conventional compute cores delivering 13,000 threads. We used only 1 core.
Software Dependencies No The paper mentions software packages like 'sci-kit dimension package', 'anndata package', 'scanpy package', 'requests package', and 'pillow package', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes In all these examples, the threshold is ϵ = 0.1. We repeat this bootstrap procedure B (B = 1000 throughout this paper) times. with variance σ2 x = σ2 y = 0.25.