Treeffuser: probabilistic prediction via conditional diffusions with gradient-boosted trees

Authors: Nicolas Beltran Velez, Alessandro A Grande, Achille Nazaret, Alp Kucukelbir, David Blei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better calibrated probabilistic predictions.
Researcher Affiliation Collaboration Department of Computer Science, Columbia University, New York, USA 2Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, USA 3Irving Institute for Cancer Dynamics, Columbia University, New York, USA 4Fero Labs, New York, USA 5Department of Statistics, Columbia University, New York, USA
Pseudocode Yes Algorithm 1: Treeffuser Training
Open Source Code Yes We implement Treeffuser in https://github.com/blei-lab/treeffuser.
Open Datasets Yes We compare Treeffuser with state-of-the-art methods for probabilistic predictions on standard UCI datasets [51].
Dataset Splits Yes We performed 10-folds cross-validation. For each fold, we tuned the hyperparameters of the methods using Bayesian optimization for 25 iterations, using 20% of the current fold s training data as a validation set.
Hardware Specification Yes We conducted all of these experiments on a 2020 Mac Book Pro with a 2.6 GHz 6-Core Intel Core i7 processor.
Software Dependencies No The paper mentions using Light GBM and XGBoost but does not provide specific version numbers for these software dependencies or other libraries.
Experiment Setup Yes We provide a short description of each hyper-parameter of the model alongside the default value. Treeffuser uses Light GBM [22] to learn the GBTs. n estimators (3000): Specifies the maximum number of trees that will be fit, regardless of whether the stopping criterion is met. learning rate (0.1): Specifies the shrinkage to use for every tree. num leaves (31): Specifies the maximum number of leaves a tree can have. early stopping rounds (50): Specifies how long to wait without a validation loss improvement before stopping. n repeats (30): Specifies how many Monte Carlo samples to draw per data point to estimate Et,ΞΆ in equation Eq. (9).