Trees with Attention for Set Prediction Tasks
Authors: Roy Hirsch, Ran Gilad-Bachrach
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the new method empirically on a wide range of problems ranging from making predictions on sub-atomic particle jets to estimating the redshift of galaxies. The new method outperforms existing tree-based methods consistently and significantly. Moreover, it is competitive and often outperforms Deep Learning. We also discuss the theoretical properties of Set-Trees and explain how they enable item-level explainability. |
| Researcher Affiliation | Academia | 1Department of EE, Tel-Aviv University, Israel 2Department of Bio-Medical Engineering, Tel-Aviv University, Israel and the Edmond J. Safra Center for Bioinformatics. Correspondence to: Roy Hirsch <royhirsch@mail.tau.ac.il>, Ran Gilad-Bachrach <rgb@tauex.tau.ac.il>. |
| Pseudocode | No | The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We release an implementation of our proposed method for the community and for reproducibility. The code is available at: https://github.com/TAU-MLwell/ Set-Tree. |
| Open Datasets | Yes | Data were generated using the MIMIC-III database (Johnson et al., 2016)... We used two popular jet classification datasets: Quark Gluon tagging (Komiske et al., 2019) and Top Tagging (Kasieczka et al., 2019)... We experimented with a multi-class point cloud classification task based on the Model Net40 dataset (Wu et al., 2015)... We used the poker hands dataset introduced in Cattral et al. (2002). |
| Dataset Splits | Yes | For all the tree-based models (GBT and GBe ST) and for all the tasks, we scanned a pre-defined hyperparameter search space. we used 10% of the training data as validation for the hyperparameters tuning... The training set consisted of 100K records... The data were split into 1.6M/200k/200k records for train/validation/test. The Top Tagging dataset included jets derived from hadronically decaying top quarks... The data were split into 1.2M/400K/400K for train/validation/test. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or specific computer specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like XGBoost, Catboost, Light GBM, and Adam optimizer but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For all the tree-based models (GBT and GBe ST) and for all the tasks, we scanned a pre-defined hyperparameter search space... The trees maximal depth was chosen within {5, 6, 8, 10}, the number of estimators was chosen within {50, 100, 200, 300} and the learning rate was chosen within {0.2, 0.1, 0.05}. We also applied known tree regularization techniques, the fraction of train records sampled per tree was chosen within {1, 0.8, 0.5} and the fraction of features sampled per tree was chosen within {1, 0.8, 0.5} (where 1 is using all the records). All DNN-based models were trained using Adam optimizer (Kingma & Ba, 2014) and a learning rate of 1e-3. We used early stopping while monitoring the validation loss with a patience of 3 epochs. |