reproducibilityindex.ai

Active Learning for Non-Parametric Regression Using Purely Random Trees

Authors: Jack Goetz, Ambuj Tewari, Paul Zimmerman

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now examine the beneﬁts of active learning on both simulated and real world data. We simulate 2 data sets, one with differing noise variance (our σ2 ϵ,k term), the other with differing function complexity (our bias2 k term), in different regions of [0, 1]d. We also examine performance on the Wine quality data set from UCI and a data set of activation energies of Claisen rearrangement reactions (Cl). We compare the performance of selecting points to label using random sampling, our active algorithm, and a naive uncertainty sampling version of our active algorithm, where each leaf nk is proportional its variance. In all experiments n(1) = n 2 and Mondrian Trees are grown using λn = n 2 2+d 1, which is theoretically motivated, but corrected so when n = 1, λn = 0. We use both Mondrian and Breiman Trees [5] as our ﬁnal regressor. Details of the data sets are in the appendix, which also contains forest versions of these experiments. Additionally all code and experiments (as well as other experiments) are available at https://github.com/jackrgoetz/Mondrian_Tree_AL.
Researcher Affiliation	Academia	Jack Goetz Ambuj Tewari University of Michigan Ann Arbor, MI 48109 {jrgoetz, tewaria, paulzim}@umich.edu Paul Zimmerman
Pseudocode	Yes	Algorithm 1: Generic 'oracle' querying algorithm; Algorithm 2: Active 'oracle estimating' algorithm
Open Source Code	Yes	Additionally all code and experiments (as well as other experiments) are available at https://github.com/jackrgoetz/Mondrian_Tree_AL.
Open Datasets	Yes	We also examine performance on the Wine quality data set from UCI and a data set of activation energies of Claisen rearrangement reactions (Cl). Details of the data sets are in the appendix, which also contains forest versions of these experiments.
Dataset Splits	No	The paper describes a pool-based active learning setting where data points are selected for labeling. It does not provide specific train/validation/test dataset splits, percentages, or absolute sample counts for the experimental setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or cloud instance specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup	Yes	In all experiments n(1) = n 2 and Mondrian Trees are grown using λn = n 2 2+d 1, which is theoretically motivated, but corrected so when n = 1, λn = 0. We use both Mondrian and Breiman Trees [5] as our ﬁnal regressor.