Feature Cross-Substitution in Adversarial Classification
Authors: Bo Li, Yevgeniy Vorobeychik
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our insight through extensive experiments, exhibiting potential perils of traditional means for feature selection. Our evaluation uses three data sets: Enron email data [21], Ling-spam data [22], and internet advertisement dataset from the UCI repository [23]. |
| Researcher Affiliation | Academia | Bo Li and Yevgeniy Vorobeychik Electrical Engineering and Computer Science Vanderbilt University {bo.li.2,yevgeniy.vorobeychik}@vanderbilt.edu |
| Pseudocode | Yes | Figure 3: Left: MILP to compute solution to (4). Right: SMA iterative algorithm using clustering and constraint generation. (Algorithm 1 SMA(X) is presented in Figure 3 (right)). |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Our evaluation uses three data sets: Enron email data [21], Ling-spam data [22], and internet advertisement dataset from the UCI repository [23]. |
| Dataset Splits | Yes | The Enron data set was divided into training set of 3172 and a test set of 2000 emails in each of 5 folds of cross-validation, with an equal number of spam and non-spam instances [21]. The Ling-spam data set was divided into 1158 instances for training and 289 for test in cross-validation with five times as much non-spam as spam, and contains 1000 features from which between 5 and 500 were sub-selected for the experiments. Finally, the UCI data set was divided into 476 training and 119 test instances in five-fold cross validation, with four times as many advertisement as non-advertisement instances. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions solving mixed-integer linear programs but does not specify any particular software dependencies, libraries, or solvers with version numbers that would be needed for replication. |
| Experiment Setup | No | The paper describes the overall model and algorithms but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other detailed training configurations in the main text. |