Fast Hyperboloid Decision Tree Algorithms

Authors: Philippe Chlenski, Ethan Turok, Antonio Khalil Moretti, Itsik Pe'er

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis. Our code can be found at https://github.com/pchlenski/hyperdt. 4 CLASSIFICATION EXPERIMENTS 4.1 PERFORMANCE BENCHMARK BASELINES
Researcher Affiliation Academia Philippe Chlenski,1 Ethan Turok,1 Antonio Moretti,2 Itsik Pe er1 1Columbia University 2Barnard College
Pseudocode No The paper describes the algorithms and mathematical formulations in text and equations, but does not provide a clearly labeled
Open Source Code Yes Our code can be found at https://github.com/pchlenski/hyperdt.
Open Datasets Yes Neuro SEED (Corso et al., 2021) is a method for embedding DNA sequences into a (potentially hyperbolic) latent space by using a Siamese neural network with a distance-preserving loss function... We trained 2, 4, 8, and 16-dimensional Poincar e embeddings of the 1,262,987 16S ribosomal RNA sequences from the Greengenes database (Mc Donald et al., 2023)... We use Polblogs (Adamic & Glance, 2005), a canonical dataset in the hyperbolic embeddings literature for graph embeddings.
Dataset Splits Yes We recorded microand macro-averaged F1 scores, AUPRs, and run times under 5-fold cross-validation. Cross-validation instances were seeded identically across predictors.
Hardware Specification Yes Benchmarks were conducted on an Ubuntu 22.04 machine equipped with an Intel Core i7-8700 CPU (6 cores, 3.20 GHz), an NVIDIA Ge Force GTX 1080 GPU with 11 Gi B of VRAM, and 15 Gi B of RAM. Storage was handled by a 2TB HDD and a 219GB SSD.
Software Dependencies Yes Experiments were implemented using Python 3.11.4, accelerated by CUDA 11.4 with driver version 470.199.02.
Experiment Setup Yes For all predictors, we use trees with depth 3 and 1 sample per leaf. For random forests, all methods use an ensemble of 12 trees. We explore performance in D = 2, 4, 8, and 16 dimensions.