reproducibilityindex.ai

BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression

Authors: Zhao Tang Luo, Huiyan Sang, Bani Mallick

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superior performance of the proposed method using simulation data and a Sacramento housing price data set.
Researcher Affiliation	Academia	1Department of Statistics, Texas A&M University, College Station, TX, USA.
Pseudocode	Yes	Algorithm 1 Connecting connected components in G
Open Source Code	Yes	An implementation of the proposed model is available at https://github.com/ztluostat/BAMDT.
Open Datasets	Yes	We apply BAMDT to analyze housing price data in Sacramento County, California, available in R package caret (Kuhn, 2021). ... Sacramento County GIS. City boundaries: Sacramento County, California, 2015, 2015. URL https://earthworks.stanford.edu/catalog/stanford-kq595nj1377.
Dataset Splits	Yes	We simulate features for a test data set of size ntest = 200. ... We ﬁrst compare the prediction performance of the ﬁve models using 5-fold cross-validation.
Hardware Specification	No	The paper mentions computation time and implementation languages (R, C++) but does not provide specific hardware details such as CPU/GPU models or memory.
Software Dependencies	No	The paper lists R packages used (igraph, fdaPDE, BART, GpGp, mgcv) along with their primary citation years, but does not specify exact version numbers for these software components.
Experiment Setup	Yes	We use M = 50 weak learners in BAMDT. For each weak learner, we randomly sample t = 100 locations from the training data as reference knots. ... We use 100 equally spaced grid points as candidates of univariate split cutoffs for each unstructured feature. The probability of performing a multivariate split is set to be pm = 2/(2 + p). ... Following Chipman et al. (2010), we choose α = 0.95 and β = 2 in (6)... We choose a = 2 by default... We choose ν = 3 and calibrate the prior by selecting λs... we run the MCMC algorithms for 30, 000 iterations, discarding the ﬁrst half and retaining samples every 10 iterations.