reproducibilityindex.ai

Mondrian Forests: Efficient Online Random Forests

Authors: Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the excellent empirical performance of MF in Section 7, and conclude in Section 8 with a discussion about future work. The purpose of these experiments is to evaluate the predictive performance (test accuracy) of MF as a function of (i) fraction of training data and (ii) training time. We divide the training data into 100 mini-batches and we compare the performance of online random forests (MF, ORF-Saffari [20]) to batch random forests (Breiman-RF, ERT-k, ERT-1) which are trained on the same fraction of the training data.
Researcher Affiliation	Academia	Balaji Lakshminarayanan Gatsby Unit University College London Daniel M. Roy Department of Engineering University of Cambridge Yee Whye Teh Department of Statistics University of Oxford
Pseudocode	Yes	Algorithm 1 Sample Mondrian Tree and Algorithm 2 Sample Mondrian Block and Algorithm 3 Extend Mondrian Tree(T, λ, D) and Algorithm 4 Extend Mondrian Block(T, λ, j, D).
Open Source Code	Yes	Our scripts are implemented in Python. We implemented the ORF-Saffari algorithm as well as ERT in Python for timing comparisons. The scripts can be downloaded from the authors webpages.
Open Datasets	Yes	We evaluate on four of the ﬁve datasets used in [20] we excluded the mushroom dataset as even very simple logical rules achieve > 99% accuracy on this dataset.4 (Footnote 4: https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.names) and dna dataset.5 (Footnote 5: https://www.sgi.com/tech/mlc/db/DNA.names).
Dataset Splits	Yes	We used the pre-deﬁned train/test split. For usps dataset D = 256, K = 10, Ntrain = 7291, Ntest = 2007; for satimages dataset D = 36, K = 6, Ntrain = 3104, Ntest = 2000; letter dataset D = 16, K = 26, Ntrain = 15000, Ntest = 5000; for dna dataset D = 180, K = 3, Ntrain = 1400, Ntest = 1186.
Hardware Specification	No	The scikit-learn implementation uses highly optimized C code, hence we do not compare our runtimes with the scikit-learn implementation. (This discusses the scikit-learn implementation's optimization, not the specific hardware used for the paper's experiments).
Software Dependencies	No	Our scripts are implemented in Python. We used the Breiman-RF implementation in scikit-learn [16]. (Specific version numbers for Python or scikit-learn are not provided.)
Experiment Setup	Yes	As is common in the random forest literature [2], we set the number of trees M = 100. For Mondrian forests, we set the lifetime λ = 1 and the HNSP discount parameter γ = 10D. For ORF-Saffari, we set num epochs = 20 (number of passes through the training data) and set the other hyper parameters to the values used in [20]. For Breiman-RF and ERT, the hyper parameters are set to default values.