ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

Authors: Zhichao Hou, Weizhi Gao, Yuchen Shen, Feiyi Wang, Xiaorui Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments and ablation studies, we demonstrate that our Pro Transformer significantly enhances the robustness of transformer models across a variety of prediction tasks, attack mechanisms, backbone architectures, and data domains. In this section, we evaluate the effectiveness of the proposed Pro Attention and Pro Transformer under classic text attacks on pre-trained language models, and two prompting-based attacks (prompt attack and jailbreak attack) in the context of LLMs with comprehensive ablation studies.
Researcher Affiliation Collaboration Zhichao Hou1 Weizhi Gao1 Yuchen Shen2 Feiyi Wang3 Xiaorui Liu1 1North Carolina State University, 2Carnegie Mellon University, 3Oak Ridge National Laboratory
Pseudocode Yes The implementation of Algorithm 1 Pro Attention using MCP penalty in Py Torch is shown in Algorithm 1. The complete pseudocode for other penalties is presented in in Appendix A.
Open Source Code Yes Our code is available at https://github.com/chrishzc/Pro Transformer.
Open Datasets Yes For topic classification, we use AG s News Corpus (AGNEWS) [51]. For sentiment analysis, we utilize two widely-used datasets: Internet Movie Database (IMDB) [52] and Stanford Sentiment Treebank (SST-2) [53]. For textual entailment, we make use of Recognizing Textual Entailment (RTE) in the General Language Understanding Evaluation benchmark [54]. For jailbreak attack, we select a new dataset Behaviors introduced in [55]. AG s News Corpus (AGNEWS) [51]: It is a collection of more than 1 million news articles.
Dataset Splits Yes The CIFAR-10 dataset ... is divided into two parts: 50,000 training images and 10,000 test images. Image Net-1K ... contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
Hardware Specification No The paper mentions GPUs in the NeurIPS checklist but does not provide specific models or detailed hardware specifications for the experimental setup, nor does it specify CPU models, memory, or cloud instance types.
Software Dependencies No Py Torch is mentioned for implementing Pro Attention (Algorithm 1), but no specific version number is provided. Other tools like 'Text Attack framework [63]' and 'Prompt Bench [15]' are mentioned without version details. This is insufficient for reproducible software dependencies.
Experiment Setup Yes For our Pro Transformer, we set the default number of Pro Attention layers as K = 3 since it can quickly converge to a reasonable precision within 3 layers. Finally we tune δ (default 1) or γ (default 4) in the penalties (Huber and MCP loss) to obtain the optimal parameters. For text attack setting, we follow the setting in the Text Attack framework [63]. For prompt attack, we follow the setting in Prompt Bench [15]. For GCG-based jailbreak attack, we follow the setting in [61].