Treeffuser: probabilistic prediction via conditional diffusions with gradient-boosted trees
Authors: Nicolas Beltran Velez, Alessandro A Grande, Achille Nazaret, Alp Kucukelbir, David Blei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better calibrated probabilistic predictions. |
| Researcher Affiliation | Collaboration | Department of Computer Science, Columbia University, New York, USA 2Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, USA 3Irving Institute for Cancer Dynamics, Columbia University, New York, USA 4Fero Labs, New York, USA 5Department of Statistics, Columbia University, New York, USA |
| Pseudocode | Yes | Algorithm 1: Treeffuser Training |
| Open Source Code | Yes | We implement Treeffuser in https://github.com/blei-lab/treeffuser. |
| Open Datasets | Yes | We compare Treeffuser with state-of-the-art methods for probabilistic predictions on standard UCI datasets [51]. |
| Dataset Splits | Yes | We performed 10-folds cross-validation. For each fold, we tuned the hyperparameters of the methods using Bayesian optimization for 25 iterations, using 20% of the current fold s training data as a validation set. |
| Hardware Specification | Yes | We conducted all of these experiments on a 2020 Mac Book Pro with a 2.6 GHz 6-Core Intel Core i7 processor. |
| Software Dependencies | No | The paper mentions using Light GBM and XGBoost but does not provide specific version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | We provide a short description of each hyper-parameter of the model alongside the default value. Treeffuser uses Light GBM [22] to learn the GBTs. n estimators (3000): Specifies the maximum number of trees that will be fit, regardless of whether the stopping criterion is met. learning rate (0.1): Specifies the shrinkage to use for every tree. num leaves (31): Specifies the maximum number of leaves a tree can have. early stopping rounds (50): Specifies how long to wait without a validation loss improvement before stopping. n repeats (30): Specifies how many Monte Carlo samples to draw per data point to estimate Et,ΞΆ in equation Eq. (9). |