reproducibilityindex.ai

Treeffuser: probabilistic prediction via conditional diffusions with gradient-boosted trees

Authors: Nicolas Beltran Velez, Alessandro A Grande, Achille Nazaret, Alp Kucukelbir, David Blei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better calibrated probabilistic predictions.
Researcher Affiliation	Collaboration	Department of Computer Science, Columbia University, New York, USA 2Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, USA 3Irving Institute for Cancer Dynamics, Columbia University, New York, USA 4Fero Labs, New York, USA 5Department of Statistics, Columbia University, New York, USA
Pseudocode	Yes	Algorithm 1: Treeffuser Training
Open Source Code	Yes	We implement Treeffuser in https://github.com/blei-lab/treeffuser.
Open Datasets	Yes	We compare Treeffuser with state-of-the-art methods for probabilistic predictions on standard UCI datasets [51].
Dataset Splits	Yes	We performed 10-folds cross-validation. For each fold, we tuned the hyperparameters of the methods using Bayesian optimization for 25 iterations, using 20% of the current fold s training data as a validation set.
Hardware Specification	Yes	We conducted all of these experiments on a 2020 Mac Book Pro with a 2.6 GHz 6-Core Intel Core i7 processor.
Software Dependencies	No	The paper mentions using Light GBM and XGBoost but does not provide specific version numbers for these software dependencies or other libraries.
Experiment Setup	Yes	We provide a short description of each hyper-parameter of the model alongside the default value. Treeffuser uses Light GBM [22] to learn the GBTs. n estimators (3000): Specifies the maximum number of trees that will be fit, regardless of whether the stopping criterion is met. learning rate (0.1): Specifies the shrinkage to use for every tree. num leaves (31): Specifies the maximum number of leaves a tree can have. early stopping rounds (50): Specifies how long to wait without a validation loss improvement before stopping. n repeats (30): Specifies how many Monte Carlo samples to draw per data point to estimate Et,ζ in equation Eq. (9).