Evasion and Hardening of Tree Ensemble Classifiers

Authors: Alex Kantchelian, J. D. Tygar, Anthony Joseph

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.
Researcher Affiliation Academia Alex Kantchelian AKANT@CS.BERKELEY.EDU J. D. Tygar TYGAR@CS.BERKELEY.EDU Anthony D. Joseph ADJ@CS.BERKELEY.EDU University of California, Berkeley
Pseudocode Yes Algorithm 1 Coordinate Descent for Problem (1)
Open Source Code No The paper discusses the use of third-party tools like XGBoost, scikit-learn, Gurobi, and Theano, but does not provide an explicit statement or link for the authors' own implementation code for the described methodology.
Open Datasets Yes We choose digit recognition over the MNIST (Le Cun et al.) dataset as our benchmark classification task
Dataset Splits No The paper states: 'Our training and testing sets respectively include 11,876 and 1,990 images' and 'tune the hyper-parameters so as to minimize the error on the testing set directly', indicating the test set was used for tuning, but does not specify a separate validation dataset split.
Hardware Specification Yes Unlike BDT, BDT-R is extremely challenging to optimally evade using the MILP solver: the branch-andbound search continues to expand nodes after 1 day on a 6 core Xeon 3.2GHz machine.
Software Dependencies Yes We use the Gurobi (Gurobi Optimization, 2015) solver to compute the optimal evasions for all distances and all models but NN and RBF-SVM.
Experiment Setup Yes Table 1 summarizes the 7 benchmarked models with their salient hyper-parameters and error rates on the testing set. For example: 'BDT 1,000 trees, depth 4, η = 0.02', 'RF 80 trees, max. depth 22', 'RBF-SVM γ = 0.04, C = 1'. Additionally, 'Here, we use B = 28, the size of the picture diagonal, as our budget.'