Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayesian Probabilistic Numerical Integration with Tree-Based Models
Authors: Harrison Zhu, Xing Liu, Ruya Kang, Zhichao Shen, Seth Flaxman, Francois-Xavier Briol
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem. |
| Researcher Affiliation | Academia | Harrison Zhu, Xing Liu Imperial College London EMAIL Ruya Kang Brown University EMAIL Zhichao Shen University of Oxford EMAIL Seth Flaxman Imperial College London EMAIL François-Xavier Briol University College London EMAIL |
| Pseudocode | Yes | Algorithm 1 Sequential Design for BART-Int |
| Open Source Code | No | The paper states that an external tool `dbarts` was used ("For BART-Int, we used the default prior settings in dbarts [20]"), but it does not provide a link or explicit statement about the release of its own source code for the methodology described. |
| Open Datasets | Yes | We use individual-level anonymised census data from the United States [79] ... [79] U.S. Census Bureau. American Community Survey, 2012-2016 ACS 5-Year PUMS Files. Technical report, U.S. Department of Commerce, Janurary 2018. |
| Dataset Splits | No | The paper describes how data points were selected for sequential design and numerical integration (e.g., "nini = 20d design points", "nseq = 20d additional points"), and how ground truth was computed for evaluation, but it does not specify traditional train/validation/test dataset splits with percentages or counts for model training or hyperparameter tuning. |
| Hardware Specification | No | The paper discusses computational complexity and run-times (Figure 2) but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "dbarts [20]" for BART-Int but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | For BART-Int, we used the default prior settings in dbarts [20], whereas for GP-BQ we used a Matérn kernel whose lengthscale was chosen through maximum likelihood. ... The MAPE is given by given by 1/r Σt=1 |Π[f] − Πˆt[f]|/|Π[f]|, where Πˆt[f] for t = 1, . . . , r, are estimates of Π[f] for r different initial i.i.d. uniform point sets. ... BART-Int (m = 1500, T = 200 m = 1000, T = 50, with a burn-in of 1000 and keeping every 5 samples afterwards) ... The number of post-burn-in samples is chosen to be 10^4. We set γ = 2, di = 0.5i and ci = 0.2i. ... We randomly select our initial set (of size nini = 20) and candidate set (of size S = 10,000). |