Exploring Adversarial Robustness of Deep State Space Models
Authors: Biqing Qi, Yiang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To response Q1 and Q2, we employ AT using common AT frameworks on various structural variants of SSMs. These include implementations such as Normal Plus Low-Rank (NPLR) decomposition (S4) [3], diagonalization (DSS) [28, 26], MIMO extension (S5) [27], integration of attention mechanisms (Mega) [19], and data-dependent SSM extension (Mamba) [20]. Subsequently, we conduct comprehensive robustness evaluations to assess their performance. |
| Researcher Affiliation | Collaboration | Biqing Qi1,2, Yiang Luo3, Junqi Gao4, , Pengfei Li4, Kai Tian1, Zhiyuan Ma1, Bowen Zhou1,2, 1 Department of Electronic Engineering, Tsinghua University, 2 Shanghai Artificial Intelligence Laboratory, 3 Department of Control Science and Engineering, Harbin Institute of Technology, 4 School of Mathematics, Harbin Institute of Technology |
| Pseudocode | No | No section or figure labeled "Pseudocode" or "Algorithm" was found. |
| Open Source Code | Yes | Our code is available at Robustness-of-SSM. |
| Open Datasets | Yes | We adopt the ST and two most commonly used AT frameworks, PGD-AT [16] and TRADES [21], as well as two more efficient and advanced adversarial training frameworks, Free AT [30] and YOPO [31], to conduct experiments on the MNIST, CIFAR-10, and Tiny-Image Net datasets. |
| Dataset Splits | No | After each training epoch of AT, we conduct adversarial testing on both the training and test sets to evaluate robustness and measure generalization on adversarial examples. The paper mentions training and test sets but does not explicitly state a separate validation set. |
| Hardware Specification | Yes | All experiments were implemented on multiple NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper describes the general training frameworks used (e.g., PGD-AT, TRADES) but does not specify software dependencies with version numbers like PyTorch or TensorFlow versions. |
| Experiment Setup | Yes | For AT, we utilize a 10-step ℓ PGD (PGD-10) as the attack method. Following [21], on MNIST, we set the adversarial budget to ϵ = 0.3, attack step size α = 0.04, the KL divergence regularizer coefficient for TRADES β = 1.0, and the training epoch to 100. On CIFAR-10 and Tiny Imagenet, we set ϵ = 0.031, α = 0.007, β = 6.0, and the training epoch to 180. ... Optimizer Adam W, Batch Size 256/128/64, Learning Rate 0.001, Scheduler cosine, Weight Decay 0.0002. |