Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Label Contamination Attacks Against Black-Box Learning Models
Authors: Mengchen Zhao, Bo An, Wei Gao, Teng Zhang
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China |
| Pseudocode | Yes | Algorithm 1: Projected Gradient Ascent (PGA) and Algorithm 2: Flip strategy |
| Open Source Code | No | The paper discusses implementing with LIBSVM [Chang and Lin, 2011] and LINLINEAR [Fan et al., 2008], which are third-party tools. It does not provide concrete access to the source code for their own methodology. |
| Open Datasets | Yes | We will use five public data sets: Australian (690 points, 14 features), W8a (10000 points, 300 features), Spambase (4601 points, 57 features) [Lichman, 2013], Wine (130 points, 14 features) and Skin (5000 points, 3 features) 1. 1Except Spambase, all data sets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/. |
| Dataset Splits | No | The paper mentions 'training data' and 'test set' but does not provide specific details about validation data splits or a methodology for model selection that explicitly uses a validation set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) were provided for running the experiments. |
| Software Dependencies | Yes | All training processes are implemented with LIBSVM [Chang and Lin, 2011] and LINLINEAR [Fan et al., 2008]. The DT, KNN and NB models are trained using MATLAB R2016b Statics and Machine Learning Toolbox and all parameters are set by default. |
| Experiment Setup | Yes | We set the regularization parameter C=1 for all five models. We set the parameters d=2 for polynomial kernel and γ=0.1 for RBF kernel. All attacks computed by PGA are the best among 50 runs. We set the attacker s budget as 30% of the training points. |