Beta Diffusion Trees
Authors: Creighton Heaukulani, David Knowles, Zoubin Ghahramani
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with several numerical experiments on missing data problems with data sets of gene expression arrays, international development statistics, and intranational socioeconomic measurements. |
| Researcher Affiliation | Academia | University of Cambridge, Department of Engineering, Cambridge, UK Stanford University, Department of Computer Science, Stanford, CA, USA |
| Pseudocode | No | In Heaukulani et al. (2014), we describe a series of Markov Chain Monte Carlo steps to integrate over the random tree structures of the beta diffusion tree. These moves are summarized as the following proposals: Resample subtrees: Randomly select a subtree rooted at a non-leaf node in the tree, and resample the paths of one or more particles down the subtree according to the prior. Add and remove replicate and stop nodes: Randomly propose an internal (either replicate or stop) node in the tree structure to remove. If the removed node is a replicate node, then the entire subtree emerging from the divergent branch is removed. If the node is a stop node, then the particles that stopped at the node need to be resample down the remaining tree according to the prior. Conversely, propose adding replicate and stop nodes to branches in the tree. Resample configurations at internal nodes: Randomly select an internal node in the tree and propose changing the decisions that particles take at the node (i.e., the decisions to either replicate at replicate nodes or stop at stop nodes). Heuristics to prune or thicken branches: Propose removing replicate (or stop) nodes at which a small proportion of the particles through the node have decided to replicate (or stop). |
| Open Source Code | No | No explicit statement about releasing source code or a link to a code repository is provided in the paper. |
| Open Datasets | Yes | E. Coli dataset of the expression levels of N = 100 genes measured at D = 24 time points (Kao et al., 2004), a UN dataset of human development statistics for N = 161 countries on D = 15 variables (UN Development Programme, 2013), and an India dataset of socioeconomic measurements for N = 400 Indian households on D = 15 variables (Desai & Vanneman, 2013). |
| Dataset Splits | No | For each data set, we created 10 different test sets, each one holding out a different 10% of the data. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models, memory, or cloud instances) used for the experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | No | All hyperparameters were given broad prior distributions and integrated out with slice sampling (Neal, 2003a). |