Active Learning for Non-Parametric Regression Using Purely Random Trees
Authors: Jack Goetz, Ambuj Tewari, Paul Zimmerman
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now examine the benefits of active learning on both simulated and real world data. We simulate 2 data sets, one with differing noise variance (our σ2 ϵ,k term), the other with differing function complexity (our bias2 k term), in different regions of [0, 1]d. We also examine performance on the Wine quality data set from UCI and a data set of activation energies of Claisen rearrangement reactions (Cl). We compare the performance of selecting points to label using random sampling, our active algorithm, and a naive uncertainty sampling version of our active algorithm, where each leaf nk is proportional its variance. In all experiments n(1) = n 2 and Mondrian Trees are grown using λn = n 2 2+d 1, which is theoretically motivated, but corrected so when n = 1, λn = 0. We use both Mondrian and Breiman Trees [5] as our final regressor. Details of the data sets are in the appendix, which also contains forest versions of these experiments. Additionally all code and experiments (as well as other experiments) are available at https://github.com/jackrgoetz/Mondrian_Tree_AL. |
| Researcher Affiliation | Academia | Jack Goetz Ambuj Tewari University of Michigan Ann Arbor, MI 48109 {jrgoetz, tewaria, paulzim}@umich.edu Paul Zimmerman |
| Pseudocode | Yes | Algorithm 1: Generic 'oracle' querying algorithm; Algorithm 2: Active 'oracle estimating' algorithm |
| Open Source Code | Yes | Additionally all code and experiments (as well as other experiments) are available at https://github.com/jackrgoetz/Mondrian_Tree_AL. |
| Open Datasets | Yes | We also examine performance on the Wine quality data set from UCI and a data set of activation energies of Claisen rearrangement reactions (Cl). Details of the data sets are in the appendix, which also contains forest versions of these experiments. |
| Dataset Splits | No | The paper describes a pool-based active learning setting where data points are selected for labeling. It does not provide specific train/validation/test dataset splits, percentages, or absolute sample counts for the experimental setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or cloud instance specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | In all experiments n(1) = n 2 and Mondrian Trees are grown using λn = n 2 2+d 1, which is theoretically motivated, but corrected so when n = 1, λn = 0. We use both Mondrian and Breiman Trees [5] as our final regressor. |