Fast Hyperboloid Decision Tree Algorithms
Authors: Philippe Chlenski, Ethan Turok, Antonio Khalil Moretti, Itsik Pe'er
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis. Our code can be found at https://github.com/pchlenski/hyperdt. 4 CLASSIFICATION EXPERIMENTS 4.1 PERFORMANCE BENCHMARK BASELINES |
| Researcher Affiliation | Academia | Philippe Chlenski,1 Ethan Turok,1 Antonio Moretti,2 Itsik Pe er1 1Columbia University 2Barnard College |
| Pseudocode | No | The paper describes the algorithms and mathematical formulations in text and equations, but does not provide a clearly labeled |
| Open Source Code | Yes | Our code can be found at https://github.com/pchlenski/hyperdt. |
| Open Datasets | Yes | Neuro SEED (Corso et al., 2021) is a method for embedding DNA sequences into a (potentially hyperbolic) latent space by using a Siamese neural network with a distance-preserving loss function... We trained 2, 4, 8, and 16-dimensional Poincar e embeddings of the 1,262,987 16S ribosomal RNA sequences from the Greengenes database (Mc Donald et al., 2023)... We use Polblogs (Adamic & Glance, 2005), a canonical dataset in the hyperbolic embeddings literature for graph embeddings. |
| Dataset Splits | Yes | We recorded microand macro-averaged F1 scores, AUPRs, and run times under 5-fold cross-validation. Cross-validation instances were seeded identically across predictors. |
| Hardware Specification | Yes | Benchmarks were conducted on an Ubuntu 22.04 machine equipped with an Intel Core i7-8700 CPU (6 cores, 3.20 GHz), an NVIDIA Ge Force GTX 1080 GPU with 11 Gi B of VRAM, and 15 Gi B of RAM. Storage was handled by a 2TB HDD and a 219GB SSD. |
| Software Dependencies | Yes | Experiments were implemented using Python 3.11.4, accelerated by CUDA 11.4 with driver version 470.199.02. |
| Experiment Setup | Yes | For all predictors, we use trees with depth 3 and 1 sample per leaf. For random forests, all methods use an ensemble of 12 trees. We explore performance in D = 2, 4, 8, and 16 dimensions. |