reproducibilityindex.ai

Beta Diffusion Trees

Authors: Creighton Heaukulani, David Knowles, Zoubin Ghahramani

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with several numerical experiments on missing data problems with data sets of gene expression arrays, international development statistics, and intranational socioeconomic measurements.
Researcher Affiliation	Academia	University of Cambridge, Department of Engineering, Cambridge, UK Stanford University, Department of Computer Science, Stanford, CA, USA
Pseudocode	No	In Heaukulani et al. (2014), we describe a series of Markov Chain Monte Carlo steps to integrate over the random tree structures of the beta diffusion tree. These moves are summarized as the following proposals: Resample subtrees: Randomly select a subtree rooted at a non-leaf node in the tree, and resample the paths of one or more particles down the subtree according to the prior. Add and remove replicate and stop nodes: Randomly propose an internal (either replicate or stop) node in the tree structure to remove. If the removed node is a replicate node, then the entire subtree emerging from the divergent branch is removed. If the node is a stop node, then the particles that stopped at the node need to be resample down the remaining tree according to the prior. Conversely, propose adding replicate and stop nodes to branches in the tree. Resample conﬁgurations at internal nodes: Randomly select an internal node in the tree and propose changing the decisions that particles take at the node (i.e., the decisions to either replicate at replicate nodes or stop at stop nodes). Heuristics to prune or thicken branches: Propose removing replicate (or stop) nodes at which a small proportion of the particles through the node have decided to replicate (or stop).
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository is provided in the paper.
Open Datasets	Yes	E. Coli dataset of the expression levels of N = 100 genes measured at D = 24 time points (Kao et al., 2004), a UN dataset of human development statistics for N = 161 countries on D = 15 variables (UN Development Programme, 2013), and an India dataset of socioeconomic measurements for N = 400 Indian households on D = 15 variables (Desai & Vanneman, 2013).
Dataset Splits	No	For each data set, we created 10 different test sets, each one holding out a different 10% of the data.
Hardware Specification	No	No specific hardware details (like CPU/GPU models, memory, or cloud instances) used for the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	No	All hyperparameters were given broad prior distributions and integrated out with slice sampling (Neal, 2003a).