Efficient Label Contamination Attacks Against Black-Box Learning Models

Authors: Mengchen Zhao, Bo An, Wei Gao, Teng Zhang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Pseudocode Yes Algorithm 1: Projected Gradient Ascent (PGA) and Algorithm 2: Flip strategy
Open Source Code No The paper discusses implementing with LIBSVM [Chang and Lin, 2011] and LINLINEAR [Fan et al., 2008], which are third-party tools. It does not provide concrete access to the source code for their own methodology.
Open Datasets Yes We will use five public data sets: Australian (690 points, 14 features), W8a (10000 points, 300 features), Spambase (4601 points, 57 features) [Lichman, 2013], Wine (130 points, 14 features) and Skin (5000 points, 3 features) 1. 1Except Spambase, all data sets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/.
Dataset Splits No The paper mentions 'training data' and 'test set' but does not provide specific details about validation data splits or a methodology for model selection that explicitly uses a validation set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) were provided for running the experiments.
Software Dependencies Yes All training processes are implemented with LIBSVM [Chang and Lin, 2011] and LINLINEAR [Fan et al., 2008]. The DT, KNN and NB models are trained using MATLAB R2016b Statics and Machine Learning Toolbox and all parameters are set by default.
Experiment Setup Yes We set the regularization parameter C=1 for all five models. We set the parameters d=2 for polynomial kernel and γ=0.1 for RBF kernel. All attacks computed by PGA are the best among 50 runs. We set the attacker s budget as 30% of the training points.