reproducibilityindex.ai

Contrastive dimension reduction: when and how?

Authors: Sam Hawke, YueEn Ma, Didong Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group.
Researcher Affiliation	Academia	Sam Hawke Department of Biostatistics University of North Carolina at Chapel Hill shawke@unc.edu Yue En Ma Department of Statistics & Operations Research University of North Carolina at Chapel Hill myueen@unc.edu Didong Li Department of Biostatistics University of North Carolina at Chapel Hill didongli@unc.edu
Pseudocode	Yes	Algorithm 1: Contrastive Bootstrap Hypothesis Test and Algorithm 2: Contrastive Dimension Estimator
Open Source Code	Yes	All code available at https://github.com/myueen/contrastive-dimension-estimation
Open Datasets	Yes	Simulation 4. To assess our methods in a simulation setting where both dx and dy are unknown, we consider a very similar simulation to the previous one, but where we start with 5923 handwritten 0 s from the MNIST dataset (Deng, 2012) instead of randomly generated disks. Corrupted MNIST. The corrupted MNIST dataset was previously studied extensively by Abid et al. (2018)... Mouse Protein. This dataset (Higuera et al., 2015) is a benchmark in the literature... m Health. The m Health dataset (Banos et al., 2015), studied in Abid et al. (2018); Severson et al. (2019)... BMMC. This is a single-cell RNA sequencing dataset (Zheng et al., 2017) studied in Abid et al. (2018)... Small Molecule. This is a dataset of cell line responses to small-molecule therapy (Mc Farland et al., 2020)... ECCITE-Seq. This ECCITE-Seq dataset (Mimitou et al., 2019)... Pathogen Data. Studied by Weinberger et al. (2023), this dataset (Haber et al., 2017)... Perturb-Seq: Studied by Weinberger et al. (2023), this dataset (Adamson et al., 2016)... Celeb A: The Celeb A dataset (Liu et al., 2015)...
Dataset Splits	No	The paper discusses training and testing, but it does not provide specific details on validation dataset splits, percentages, or methodology for its experiments.
Hardware Specification	Yes	All experiments were run on a Linux-based virtual computer with 6500 conventional compute cores delivering 13,000 threads. We used only 1 core.
Software Dependencies	No	The paper mentions software packages like 'sci-kit dimension package', 'anndata package', 'scanpy package', 'requests package', and 'pillow package', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	In all these examples, the threshold is ϵ = 0.1. We repeat this bootstrap procedure B (B = 1000 throughout this paper) times. with variance σ2 x = σ2 y = 0.25.