Machine Unlearning for Random Forests

Authors: Jonathan Brophy, Daniel Lowd

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments on 13 real-world datasets and one synthetic dataset, we find Da RE forests delete data orders of magnitude faster than retraining from scratch while sacrificing little to no predictive power.
Researcher Affiliation Academia 1Department of Computer and Information Science, University of Oregon, Eugene, Oregon.
Pseudocode Yes We present abridged versions for training and updating a Da RE tree in Algorithms 1 and 2, respectively, with full explanations below. Detailed pseudocode for both operations is in the Appendix, A.8.
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the methodology described.
Open Datasets Yes We conduct our experiments on 13 publicly-available datasets that represent problems well-suited for tree-based models, and one synthetic dataset we call Synthetic. For each dataset, we generate one-hot encodings for any categorical variable and leave all numeric and binary variables as is.
Dataset Splits Yes For any dataset without a designated train and test split, we randomly sample 80% of the data for training and use the rest for testing. ... first, we tune a greedy model (i.e. by keeping drmax = 0 fixed) using 5-fold cross-validation.
Hardware Specification Yes System hardware specifications are in the Appendix: B. ... All experiments were run on a Debian 10 Linux machine with an AMD Ryzen 9 3950X 16-Core Processor and 128 GB of RAM.
Software Dependencies No The paper mentions using 'Scikit-learn' but does not specify exact version numbers for any software dependencies.
Experiment Setup Yes Using these metrics and Gini index as the split criterion, we tune the following hyperparameters: the maximum depth of each tree dmax, the number of trees in the forest T, and the number of thresholds considered per attribute for greedy nodes k. ... Selected hyperparameter values are in the Appendix: B.2, Table 6.