reproducibilityindex.ai

Fast Hyperboloid Decision Tree Algorithms

Authors: Philippe Chlenski, Ethan Turok, Antonio Khalil Moretti, Itsik Pe'er

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis. Our code can be found at https://github.com/pchlenski/hyperdt. 4 CLASSIFICATION EXPERIMENTS 4.1 PERFORMANCE BENCHMARK BASELINES
Researcher Affiliation	Academia	Philippe Chlenski,1 Ethan Turok,1 Antonio Moretti,2 Itsik Pe er1 1Columbia University 2Barnard College
Pseudocode	No	The paper describes the algorithms and mathematical formulations in text and equations, but does not provide a clearly labeled
Open Source Code	Yes	Our code can be found at https://github.com/pchlenski/hyperdt.
Open Datasets	Yes	Neuro SEED (Corso et al., 2021) is a method for embedding DNA sequences into a (potentially hyperbolic) latent space by using a Siamese neural network with a distance-preserving loss function... We trained 2, 4, 8, and 16-dimensional Poincar e embeddings of the 1,262,987 16S ribosomal RNA sequences from the Greengenes database (Mc Donald et al., 2023)... We use Polblogs (Adamic & Glance, 2005), a canonical dataset in the hyperbolic embeddings literature for graph embeddings.
Dataset Splits	Yes	We recorded microand macro-averaged F1 scores, AUPRs, and run times under 5-fold cross-validation. Cross-validation instances were seeded identically across predictors.
Hardware Specification	Yes	Benchmarks were conducted on an Ubuntu 22.04 machine equipped with an Intel Core i7-8700 CPU (6 cores, 3.20 GHz), an NVIDIA Ge Force GTX 1080 GPU with 11 Gi B of VRAM, and 15 Gi B of RAM. Storage was handled by a 2TB HDD and a 219GB SSD.
Software Dependencies	Yes	Experiments were implemented using Python 3.11.4, accelerated by CUDA 11.4 with driver version 470.199.02.
Experiment Setup	Yes	For all predictors, we use trees with depth 3 and 1 sample per leaf. For random forests, all methods use an ensemble of 12 trees. We explore performance in D = 2, 4, 8, and 16 dimensions.