Feature Cross-Substitution in Adversarial Classification

Authors: Bo Li, Yevgeniy Vorobeychik

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our insight through extensive experiments, exhibiting potential perils of traditional means for feature selection. Our evaluation uses three data sets: Enron email data [21], Ling-spam data [22], and internet advertisement dataset from the UCI repository [23].
Researcher Affiliation Academia Bo Li and Yevgeniy Vorobeychik Electrical Engineering and Computer Science Vanderbilt University {bo.li.2,yevgeniy.vorobeychik}@vanderbilt.edu
Pseudocode Yes Figure 3: Left: MILP to compute solution to (4). Right: SMA iterative algorithm using clustering and constraint generation. (Algorithm 1 SMA(X) is presented in Figure 3 (right)).
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Our evaluation uses three data sets: Enron email data [21], Ling-spam data [22], and internet advertisement dataset from the UCI repository [23].
Dataset Splits Yes The Enron data set was divided into training set of 3172 and a test set of 2000 emails in each of 5 folds of cross-validation, with an equal number of spam and non-spam instances [21]. The Ling-spam data set was divided into 1158 instances for training and 289 for test in cross-validation with five times as much non-spam as spam, and contains 1000 features from which between 5 and 500 were sub-selected for the experiments. Finally, the UCI data set was divided into 476 training and 119 test instances in five-fold cross validation, with four times as many advertisement as non-advertisement instances.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions solving mixed-integer linear programs but does not specify any particular software dependencies, libraries, or solvers with version numbers that would be needed for replication.
Experiment Setup No The paper describes the overall model and algorithms but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other detailed training configurations in the main text.