Geometry Based Data Generation

Authors: Ofir Lindenbaum, Jay Stanley, Guy Wolf, Smita Krishnaswamy

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.
Researcher Affiliation Academia Ofir Lindenbaum Applied Mathematics Program Yale University New Haven, CT 06511 ofir.lindenbaum@yale.edu Jay S. Stanley III Computational Biology & Bioinformatics Program Yale University New Haven, CT 06510 jay.stanley@yale.edu Guy Wolf Applied Mathematics Program Yale University New Haven, CT 06511 guy.wolf@yale.edu Smita Krishnaswamy Departments of Genetics & Computer Science Yale University New Haven, CT 06510 smita.krishnawamy@yale.edu
Pseudocode Yes Algorithm 1 SUGAR: Synthesis Using Geometrically Aligned Random-walks
Open Source Code Yes We note that a toolbox implementing the presented algorithm is available via Git Hub3 for free academic use (see supplement for details), and we expect future work to apply SUGAR to study extremely biased biological datasets and improve classification and regression performance on them. ... 3URL: github.com/Krishnaswamy Lab/SUGAR
Open Datasets Yes To begin, we rotated an example of a handwritten 6 from the MNIST dataset... (Section 5.1). ... 61 imbalanced datasets of varying size (from hundreds to thousands) and imbalance ratio (1.8 130), obtained from Alcalá-Fdez et al. (2009) (Section 5.3). ... 115 datasets obtained from Alcalá-Fdez et al. (2009) (Section 5.4). ... In Velten et al. (2017), a high dimensional yet small (X R1029 12553) single-cell RNA sequencing (sc RNA-seq) dataset was collected (Section 5.5).
Dataset Splits Yes We compared SUGAR, RUSBoost, and SMOTE for improving k-NN and kernel SVM classification of 61 imbalanced datasets... using 10-fold cross validation (Section 5.3).
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies No The paper mentions using a VAE and GAN for comparison but does not specify software dependencies with version numbers for its own method or the tools used (e.g., Python version, library versions like TensorFlow, PyTorch, or scikit-learn).
Experiment Setup No While Algorithm 1 outlines the steps for SUGAR, the paper does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other detailed experimental configuration settings for any of its experiments.