Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast Hyperboloid Decision Tree Algorithms
Authors: Philippe Chlenski, Ethan Turok, Antonio Khalil Moretti, Itsik Pe'er
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis. Our code can be found at https://github.com/pchlenski/hyperdt. 4 CLASSIFICATION EXPERIMENTS 4.1 PERFORMANCE BENCHMARK BASELINES |
| Researcher Affiliation | Academia | Philippe Chlenski,1 Ethan Turok,1 Antonio Moretti,2 Itsik Pe er1 1Columbia University 2Barnard College |
| Pseudocode | No | The paper describes the algorithms and mathematical formulations in text and equations, but does not provide a clearly labeled |
| Open Source Code | Yes | Our code can be found at https://github.com/pchlenski/hyperdt. |
| Open Datasets | Yes | Neuro SEED (Corso et al., 2021) is a method for embedding DNA sequences into a (potentially hyperbolic) latent space by using a Siamese neural network with a distance-preserving loss function... We trained 2, 4, 8, and 16-dimensional Poincar e embeddings of the 1,262,987 16S ribosomal RNA sequences from the Greengenes database (Mc Donald et al., 2023)... We use Polblogs (Adamic & Glance, 2005), a canonical dataset in the hyperbolic embeddings literature for graph embeddings. |
| Dataset Splits | Yes | We recorded microand macro-averaged F1 scores, AUPRs, and run times under 5-fold cross-validation. Cross-validation instances were seeded identically across predictors. |
| Hardware Specification | Yes | Benchmarks were conducted on an Ubuntu 22.04 machine equipped with an Intel Core i7-8700 CPU (6 cores, 3.20 GHz), an NVIDIA Ge Force GTX 1080 GPU with 11 Gi B of VRAM, and 15 Gi B of RAM. Storage was handled by a 2TB HDD and a 219GB SSD. |
| Software Dependencies | Yes | Experiments were implemented using Python 3.11.4, accelerated by CUDA 11.4 with driver version 470.199.02. |
| Experiment Setup | Yes | For all predictors, we use trees with depth 3 and 1 sample per leaf. For random forests, all methods use an ensemble of 12 trees. We explore performance in D = 2, 4, 8, and 16 dimensions. |