Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On UMAP's True Loss Function
Authors: Sebastian Damrich, Fred A. Hamprecht
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical ๏ฌndings on toy and single cell RNA sequencing data. |
| Researcher Affiliation | Academia | Sebastian Damrich Fred A. Hamprecht HCI/IWR at Heidelberg University, 69120 Heidelberg, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1: UMAP s optimization |
| Open Source Code | Yes | Our code is publicly available at https://github.com/hci-unihd/UMAPs-true-loss. |
| Open Datasets | Yes | We illustrate our analysis on gene expression measurements of 86024 cells of C. elegans [16, 14]. We start out with a 100 dimensional PCA of the data obtained from http://cb.csail.mit.edu/cb/densvis/datasets/. We informed the authors of our use of the dataset, which they license under CC BY-NC 2.0. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions a GitHub repository for their code but does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We start out with a 100 dimensional PCA of the data and use the cosine metric in high-dimensional space, consider k = 30 neighbors and optimize for 750 epochs, similar to [14]. |