Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
NGBoost: Natural Gradient Boosting for Probabilistic Prediction
Authors: Tony Duan, Avati Anand, Daisy Yi Ding, Khanh K. Thai, Sanjay Basu, Andrew Ng, Alejandro Schuler
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments use datasets from the UCI Machine Learning Repository, and follow the protocol first proposed in Hern andez-Lobato and Adams (2015). For all datasets, we hold out a random 10% of the examples as a test set. From the other 90% we initially hold out 20% as a validation set to select M (the number of boosting stages) that gives the best log-likelihood, and then retrain on the entire 90% using the chosen M. The retrained model is then made to predict on the held-out 10% test set. This entire process is repeated 20 times for all datasets except Protein and Year MSD, for which it is repeated 5 times and 1 time respectively. |
| Researcher Affiliation | Collaboration | 1Stanford University, Stanford, California, United States 2Unlearn.ai, San Francisco, California, United States 3Harvard Medical School, Cambridge, Massachusetts, United States. |
| Pseudocode | Yes | Algorithm 1 NGBoost for probabilistic prediction |
| Open Source Code | Yes | An open-source implementation is available at github.com/stanfordmlgroup/ngboost. |
| Open Datasets | Yes | Our experiments use datasets from the UCI Machine Learning Repository, and follow the protocol first proposed in Hern andez-Lobato and Adams (2015). |
| Dataset Splits | Yes | For all datasets, we hold out a random 10% of the examples as a test set. From the other 90% we initially hold out 20% as a validation set to select M (the number of boosting stages) that gives the best log-likelihood, and then retrain on the entire 90% using the chosen M. |
| Hardware Specification | No | The paper discusses computational aspects like mini-batching and scalability to large datasets, but it does not provide specific details on the hardware used, such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using 'Scikit-Learn implementation' for comparison methods but does not provide specific version numbers for Scikit-Learn or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, NGBoost was configured with the Normal distribution, decision tree base learner with a maximum depth of three levels, and log scoring rule. The Year MSD dataset, being extremely large relative to the rest, was fit using a learning rate η of 0.1 while the rest of the datasets were fit with a learning rate of 0.01. In general we recommend small learning rates, subject to computational feasibility. For the Year MSD dataset we use a mini-batch size of 10%, for all other datasets we use 100%. |