Neural Tangent Kernels for Axis-Aligned Tree Ensembles

Authors: Ryuichi Kanoh, Mahito Sugiyama

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments show a variety of suitable features depending on the type of constraints. Our NTK analysis highlights both the theoretical and practical impacts of the axis-aligned constraint in tree ensemble learning.
Researcher Affiliation Academia 1National Institute of Informatics 2The Graduate University for Advanced Studies, SOKENDAI.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The implementation we used in our numerical experiments is available online1. 1https://github.com/ryuichi0704/aa-tntk
Open Datasets Yes We use Easy MKL (Aiolli & Donini, 2015), a convex approach that identifies kernel combinations maximizing the margin between classes. Figure 6 displays the weights obtained by Easy MKL on the entire tic-tac-toe dataset preprocessed by Fernández-Delgado et al. (2014). [...] In this experiment, we used the diabetes dataset2, a commonly used real-world dataset for regression tasks... 2https://archive.ics.uci.edu/dataset/34/diabetes
Dataset Splits Yes Figure 7 displays the results of four-fold cross-validation, where 25 percent of the total amount of data were used for training and the remainder for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup Yes We set α = 2.0 and β = 0.5. [...] The models with M = 16 and 1024 are trained using fullbatch gradient descent with a learning rate of 0.1. [...] Kernel parameters were set with α in {0.5, 1.0, 2.0, 4.0} and β in {0.1, 0.5, 1.0}. We used the regularization strength C = 1.0 in SVMs. For RF/GBDT, the number of weak learners is set to 1000.