Contrastive dimension reduction: when and how?
Authors: Sam Hawke, YueEn Ma, Didong Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group. |
| Researcher Affiliation | Academia | Sam Hawke Department of Biostatistics University of North Carolina at Chapel Hill shawke@unc.edu Yue En Ma Department of Statistics & Operations Research University of North Carolina at Chapel Hill myueen@unc.edu Didong Li Department of Biostatistics University of North Carolina at Chapel Hill didongli@unc.edu |
| Pseudocode | Yes | Algorithm 1: Contrastive Bootstrap Hypothesis Test and Algorithm 2: Contrastive Dimension Estimator |
| Open Source Code | Yes | All code available at https://github.com/myueen/contrastive-dimension-estimation |
| Open Datasets | Yes | Simulation 4. To assess our methods in a simulation setting where both dx and dy are unknown, we consider a very similar simulation to the previous one, but where we start with 5923 handwritten 0 s from the MNIST dataset (Deng, 2012) instead of randomly generated disks. Corrupted MNIST. The corrupted MNIST dataset was previously studied extensively by Abid et al. (2018)... Mouse Protein. This dataset (Higuera et al., 2015) is a benchmark in the literature... m Health. The m Health dataset (Banos et al., 2015), studied in Abid et al. (2018); Severson et al. (2019)... BMMC. This is a single-cell RNA sequencing dataset (Zheng et al., 2017) studied in Abid et al. (2018)... Small Molecule. This is a dataset of cell line responses to small-molecule therapy (Mc Farland et al., 2020)... ECCITE-Seq. This ECCITE-Seq dataset (Mimitou et al., 2019)... Pathogen Data. Studied by Weinberger et al. (2023), this dataset (Haber et al., 2017)... Perturb-Seq: Studied by Weinberger et al. (2023), this dataset (Adamson et al., 2016)... Celeb A: The Celeb A dataset (Liu et al., 2015)... |
| Dataset Splits | No | The paper discusses training and testing, but it does not provide specific details on validation dataset splits, percentages, or methodology for its experiments. |
| Hardware Specification | Yes | All experiments were run on a Linux-based virtual computer with 6500 conventional compute cores delivering 13,000 threads. We used only 1 core. |
| Software Dependencies | No | The paper mentions software packages like 'sci-kit dimension package', 'anndata package', 'scanpy package', 'requests package', and 'pillow package', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | In all these examples, the threshold is ϵ = 0.1. We repeat this bootstrap procedure B (B = 1000 throughout this paper) times. with variance σ2 x = σ2 y = 0.25. |