Geometry Based Data Generation
Authors: Ofir Lindenbaum, Jay Stanley, Guy Wolf, Smita Krishnaswamy
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis. |
| Researcher Affiliation | Academia | Ofir Lindenbaum Applied Mathematics Program Yale University New Haven, CT 06511 ofir.lindenbaum@yale.edu Jay S. Stanley III Computational Biology & Bioinformatics Program Yale University New Haven, CT 06510 jay.stanley@yale.edu Guy Wolf Applied Mathematics Program Yale University New Haven, CT 06511 guy.wolf@yale.edu Smita Krishnaswamy Departments of Genetics & Computer Science Yale University New Haven, CT 06510 smita.krishnawamy@yale.edu |
| Pseudocode | Yes | Algorithm 1 SUGAR: Synthesis Using Geometrically Aligned Random-walks |
| Open Source Code | Yes | We note that a toolbox implementing the presented algorithm is available via Git Hub3 for free academic use (see supplement for details), and we expect future work to apply SUGAR to study extremely biased biological datasets and improve classification and regression performance on them. ... 3URL: github.com/Krishnaswamy Lab/SUGAR |
| Open Datasets | Yes | To begin, we rotated an example of a handwritten 6 from the MNIST dataset... (Section 5.1). ... 61 imbalanced datasets of varying size (from hundreds to thousands) and imbalance ratio (1.8 130), obtained from Alcalá-Fdez et al. (2009) (Section 5.3). ... 115 datasets obtained from Alcalá-Fdez et al. (2009) (Section 5.4). ... In Velten et al. (2017), a high dimensional yet small (X R1029 12553) single-cell RNA sequencing (sc RNA-seq) dataset was collected (Section 5.5). |
| Dataset Splits | Yes | We compared SUGAR, RUSBoost, and SMOTE for improving k-NN and kernel SVM classification of 61 imbalanced datasets... using 10-fold cross validation (Section 5.3). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions using a VAE and GAN for comparison but does not specify software dependencies with version numbers for its own method or the tools used (e.g., Python version, library versions like TensorFlow, PyTorch, or scikit-learn). |
| Experiment Setup | No | While Algorithm 1 outlines the steps for SUGAR, the paper does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other detailed experimental configuration settings for any of its experiments. |