Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel

Authors: Ryuichi Kanoh, Mahito Sugiyama

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally examined the effects of the degeneracy phenomenon discussed in Section 4.2. Setup. We used 90 classification tasks in the UCI database (Dua & Graff, 2017), each of which has fewer than 5000 data points as in (Arora et al., 2020). We performed kernel regression using the limiting NTK defined in Equation 5 and Equation 11, equivalent to the infinite ensemble of the perfect binary trees and decision lists. ... Performance. Figure 8 shows the averaged performance in classification accuracy on 90 datasets.
Researcher Affiliation Academia Ryuichi Kanoh1,2, Mahito Sugiyama1,2 1National Institute of Informatics 2The Graduate University for Advanced Studies, SOKENDAI
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes REPRODUCIBILITY STATEMENT Proofs are provided in the Appendix. For numerical experiments and figures, reproducible source codes are shared in the supplementary material.
Open Datasets Yes We used 90 classification tasks in the UCI database (Dua & Graff, 2017)
Dataset Splits Yes We report four-fold crossvalidation performance with random data splitting as in Arora et al. (2020) and Fern andez-Delgado et al. (2014).
Hardware Specification Yes We ran all experiments on 2.20 GHz Intel Xeon E5-2698 CPU and 252 GB of memory with Ubuntu Linux (version: 4.15.0-117-generic).
Software Dependencies No We used scikit-learn2 to perform kernel regression. We used scikit-learn3 for the implementation. (No specific version numbers are provided for these software components).
Experiment Setup Yes We used D in {2, 4, 8, 16, 32, 64, 128} and α in {1.0, 2.0, 4.0, 8.0, 16.0, 32.0}. The scaled error function is used as a decision function. To consider the ridge-less situation, regularization strength is fixed to 1.0 10 8. As for hyperparameters, we used max_depth in {2, 4, 6}, subsample in {0.6, 0.8, 1.0}, learning_rate in {0.1, 0.01, 0.001}, and n_estimators (the number of trees) in {100, 300, 500}.