reproducibilityindex.ai

Geometry Based Data Generation

Authors: Ofir Lindenbaum, Jay Stanley, Guy Wolf, Smita Krishnaswamy

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classiﬁcation. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.
Researcher Affiliation	Academia	Oﬁr Lindenbaum Applied Mathematics Program Yale University New Haven, CT 06511 ofir.lindenbaum@yale.edu Jay S. Stanley III Computational Biology & Bioinformatics Program Yale University New Haven, CT 06510 jay.stanley@yale.edu Guy Wolf Applied Mathematics Program Yale University New Haven, CT 06511 guy.wolf@yale.edu Smita Krishnaswamy Departments of Genetics & Computer Science Yale University New Haven, CT 06510 smita.krishnawamy@yale.edu
Pseudocode	Yes	Algorithm 1 SUGAR: Synthesis Using Geometrically Aligned Random-walks
Open Source Code	Yes	We note that a toolbox implementing the presented algorithm is available via Git Hub3 for free academic use (see supplement for details), and we expect future work to apply SUGAR to study extremely biased biological datasets and improve classiﬁcation and regression performance on them. ... 3URL: github.com/Krishnaswamy Lab/SUGAR
Open Datasets	Yes	To begin, we rotated an example of a handwritten 6 from the MNIST dataset... (Section 5.1). ... 61 imbalanced datasets of varying size (from hundreds to thousands) and imbalance ratio (1.8 130), obtained from Alcalá-Fdez et al. (2009) (Section 5.3). ... 115 datasets obtained from Alcalá-Fdez et al. (2009) (Section 5.4). ... In Velten et al. (2017), a high dimensional yet small (X R1029 12553) single-cell RNA sequencing (sc RNA-seq) dataset was collected (Section 5.5).
Dataset Splits	Yes	We compared SUGAR, RUSBoost, and SMOTE for improving k-NN and kernel SVM classiﬁcation of 61 imbalanced datasets... using 10-fold cross validation (Section 5.3).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions using a VAE and GAN for comparison but does not specify software dependencies with version numbers for its own method or the tools used (e.g., Python version, library versions like TensorFlow, PyTorch, or scikit-learn).
Experiment Setup	No	While Algorithm 1 outlines the steps for SUGAR, the paper does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other detailed experimental configuration settings for any of its experiments.