Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity

Authors: Pierre Houédry, Nicolas Courty, Florestan Martin-Baillon, Laetitia Chapel, Titouan Vayer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experimental Section 4, we first evaluate DELTAZERO in a controlled setting to assess its ability to provide hierarchical clusters. We then measure its ability to generate low distortion tree metric approximations in two contexts, where unweighted and weighted graphs are at stake.
Researcher Affiliation	Academia	Pierre Houedry Université Bretagne Sud IRISA, UMR 6074, CNRS EMAIL Nicolas Courty Université Bretagne Sud IRISA, UMR 6074, CNRS EMAIL Florestan Martin-Baillon Université de Rennes IRMAR, UMR 6625, CNRS EMAIL Laetitia Chapel L Institut Agro Rennes-Angers IRISA, UMR 6074, CNRS EMAIL Titouan Vayer INRIA, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1 LIP, UMR 5668 EMAIL
Pseudocode	Yes	Algorithm 1 DELTAZERO Algorithm 2 GROMOV
Open Source Code	Yes	To ensure reproducibility, we provide all code and experiments at https://github.com/ pierrehouedry/Differentiable Hyperbolicity.
Open Datasets	Yes	For C-ELEGAN and CS PHD, we relied on the pre-computed distance matrices provided at https: //github.com/rsonthel/Tree Rep. The C-ELEGAN dataset captures the neural connectivity of the Caenorhabditis elegans roundworm, a widely studied model organism in neuroscience. The CS PHD dataset represents a co-authorship network among computer science Ph D holders, reflecting patterns of academic collaboration. For CORA and AIRPORT, we used the graph data from https: //github.com/Hazy Research/hgcn and computed the shortest-path distance matrices on the largest connected components. CORA is a citation network of machine learning publications, while AIRPORT models air traffic routes between airports. The WIKI dataset, representing a hyperlink graph between Wikipedia pages, was obtained from the torch_geometric library [15]. ZEISEL is a single-cell RNA sequencing dataset describing the transcriptomic profiles of mouse cortex and hippocampus cells, commonly used to study cell type organization in neurobiology. It was obtained from https://github.com/solevillar/sc Gene Fit-python. The IBD dataset (Inflammatory Bowel Disease) contains 396 metagenomic samples across 606 expressed microbial species. The dataset is publicly available via Bio Project accession number PRJEB1220.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It mentions '30 repetitions are performed' for synthetic data and '100 runs using the same randomly sampled roots' for real datasets, which are related to evaluation runs rather than standard dataset partitioning.
Hardware Specification	Yes	All experiments were conducted using a single NVIDIA TITAN RTX GPU with 24 GB of VRAM.
Software Dependencies	No	Our optimization procedure was implemented in Python using Py Torch [33] for automatic differentiation and gradient-based optimization. All experiments were run using Py Torch s GPU backend with CUDA acceleration when available. The implementation was written in Python 3.11 and executed on a machine running Ubuntu 22.04.
Experiment Setup	Yes	For DELTAZERO, we perform grid search over the following hyperparameters: learning rate ϵ {0.1, 0.01, 0.001}, distance regularization coefficient µ {0.1, 0.01, 1.0}, and δ-scaling parameter λ {0.01, 0.1, 1.0, 10.0}. We fix the number of training epochs to T = 1000, batch size m = 32, and vary the number of batches K {100, 500, 1000, 3000, 5000}. For each setting, we select the best configuration which leads to the minimal distortion. To ensure stability, we apply early stopping with a patience of 50 epochs and retain the model with the best training loss.