reproducibilityindex.ai

Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data

Authors: Yujun Zhou, Yufei Han, Haomin Zhuang, Hongyan Bao, Xiangliang Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive empirical studies over categorical datasets of various application domains. The results affirm the efficacy of both IGSG and IGSG-based regularization.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA 2INRIA, France 3King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
Pseudocode	Yes	Algorithm 1 FSGS for general categorical data; Algorithm 2 OMPGS for general categorical data; Algorithm 3 FSGS + PGD for mixed-type data; Algorithm 4 OMPGS + PGD for mixed-type data
Open Source Code	Yes	The code is available at https://github.com/YujunZhou/IGSG.
Open Datasets	Yes	Splice-junction Gene Sequences (Splice) (Noordewier et al., 1990). Windows PE Malware Detection (PEDec) (Bao et al., 2021). Census-Income (KDD) Data (Census) (Lane & Kohavi, 2000).
Dataset Splits	No	For Splice and PEDec, we use 90% and 10% of the data samples as the training and testing set to measure the adversarial classification accuracy. For Census, we use the testing and the training set given by (Lane & Kohavi, 2000), i.e., 199,523 for training and 99,762 for testing. No explicit mention of a separate validation set split is provided.
Hardware Specification	Yes	All our implementations are conducted in the Python library PyTorch on a Linux server with a single GPU (NVIDIA V100).
Software Dependencies	No	The paper mentions 'Python library PyTorch' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The Settings of the Hyper-parameters in the Training Phase: We set the learning rates to 0.07, 0.2, and 0.008 for Splice, PEDec, and Census datasets, respectively. For the hyper-parameters β of the proposed IGSGreg method in Eq.8, we empirically choose 0.01, 1, 10 for MLP on the three datasets respectively and 100 for Transformer on all datasets. In the case of the PGD-1 attack in the Adv Train, AFD, and TRADES methods, we set ϵ to be 5 for the three datasets. The attack consists of 20 iterations, with the attack step size set to ϵ/10. For the training epochs, we execute 3000, 180, and 100 epochs on Splice, PEDec, and Census respectively.