reproducibilityindex.ai

Machine Unlearning for Random Forests

Authors: Jonathan Brophy, Daniel Lowd

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments on 13 real-world datasets and one synthetic dataset, we find Da RE forests delete data orders of magnitude faster than retraining from scratch while sacrificing little to no predictive power.
Researcher Affiliation	Academia	1Department of Computer and Information Science, University of Oregon, Eugene, Oregon.
Pseudocode	Yes	We present abridged versions for training and updating a Da RE tree in Algorithms 1 and 2, respectively, with full explanations below. Detailed pseudocode for both operations is in the Appendix, A.8.
Open Source Code	No	The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets	Yes	We conduct our experiments on 13 publicly-available datasets that represent problems well-suited for tree-based models, and one synthetic dataset we call Synthetic. For each dataset, we generate one-hot encodings for any categorical variable and leave all numeric and binary variables as is.
Dataset Splits	Yes	For any dataset without a designated train and test split, we randomly sample 80% of the data for training and use the rest for testing. ... ﬁrst, we tune a greedy model (i.e. by keeping drmax = 0 ﬁxed) using 5-fold cross-validation.
Hardware Specification	Yes	System hardware speciﬁcations are in the Appendix: B. ... All experiments were run on a Debian 10 Linux machine with an AMD Ryzen 9 3950X 16-Core Processor and 128 GB of RAM.
Software Dependencies	No	The paper mentions using 'Scikit-learn' but does not specify exact version numbers for any software dependencies.
Experiment Setup	Yes	Using these metrics and Gini index as the split criterion, we tune the following hyperparameters: the maximum depth of each tree dmax, the number of trees in the forest T, and the number of thresholds considered per attribute for greedy nodes k. ... Selected hyperparameter values are in the Appendix: B.2, Table 6.