Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On UMAP's True Loss Function

Authors: Sebastian Damrich, Fred A. Hamprecht

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical ๏ฌndings on toy and single cell RNA sequencing data.
Researcher Affiliation Academia Sebastian Damrich Fred A. Hamprecht HCI/IWR at Heidelberg University, 69120 Heidelberg, Germany EMAIL
Pseudocode Yes Algorithm 1: UMAP s optimization
Open Source Code Yes Our code is publicly available at https://github.com/hci-unihd/UMAPs-true-loss.
Open Datasets Yes We illustrate our analysis on gene expression measurements of 86024 cells of C. elegans [16, 14]. We start out with a 100 dimensional PCA of the data obtained from http://cb.csail.mit.edu/cb/densvis/datasets/. We informed the authors of our use of the dataset, which they license under CC BY-NC 2.0.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions a GitHub repository for their code but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We start out with a 100 dimensional PCA of the data and use the cosine metric in high-dimensional space, consider k = 30 neighbors and optimize for 750 epochs, similar to [14].