Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data
Authors: Yujun Zhou, Yufei Han, Haomin Zhuang, Hongyan Bao, Xiangliang Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive empirical studies over categorical datasets of various application domains. The results affirm the efficacy of both IGSG and IGSG-based regularization. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA 2INRIA, France 3King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. |
| Pseudocode | Yes | Algorithm 1 FSGS for general categorical data; Algorithm 2 OMPGS for general categorical data; Algorithm 3 FSGS + PGD for mixed-type data; Algorithm 4 OMPGS + PGD for mixed-type data |
| Open Source Code | Yes | The code is available at https://github.com/YujunZhou/IGSG. |
| Open Datasets | Yes | Splice-junction Gene Sequences (Splice) (Noordewier et al., 1990). Windows PE Malware Detection (PEDec) (Bao et al., 2021). Census-Income (KDD) Data (Census) (Lane & Kohavi, 2000). |
| Dataset Splits | No | For Splice and PEDec, we use 90% and 10% of the data samples as the training and testing set to measure the adversarial classification accuracy. For Census, we use the testing and the training set given by (Lane & Kohavi, 2000), i.e., 199,523 for training and 99,762 for testing. No explicit mention of a separate validation set split is provided. |
| Hardware Specification | Yes | All our implementations are conducted in the Python library PyTorch on a Linux server with a single GPU (NVIDIA V100). |
| Software Dependencies | No | The paper mentions 'Python library PyTorch' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The Settings of the Hyper-parameters in the Training Phase: We set the learning rates to 0.07, 0.2, and 0.008 for Splice, PEDec, and Census datasets, respectively. For the hyper-parameters β of the proposed IGSGreg method in Eq.8, we empirically choose 0.01, 1, 10 for MLP on the three datasets respectively and 100 for Transformer on all datasets. In the case of the PGD-1 attack in the Adv Train, AFD, and TRADES methods, we set ϵ to be 5 for the three datasets. The attack consists of 20 iterations, with the attack step size set to ϵ/10. For the training epochs, we execute 3000, 180, and 100 epochs on Splice, PEDec, and Census respectively. |