Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

Authors: Zhichao Hou, Weizhi Gao, Yuchen Shen, Feiyi Wang, Xiaorui Liu

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experiments and ablation studies, we demonstrate that our Pro Transformer significantly enhances the robustness of transformer models across a variety of prediction tasks, attack mechanisms, backbone architectures, and data domains. In this section, we evaluate the effectiveness of the proposed Pro Attention and Pro Transformer under classic text attacks on pre-trained language models, and two prompting-based attacks (prompt attack and jailbreak attack) in the context of LLMs with comprehensive ablation studies.
Researcher Affiliation	Collaboration	Zhichao Hou1 Weizhi Gao1 Yuchen Shen2 Feiyi Wang3 Xiaorui Liu1 1North Carolina State University, 2Carnegie Mellon University, 3Oak Ridge National Laboratory
Pseudocode	Yes	The implementation of Algorithm 1 Pro Attention using MCP penalty in Py Torch is shown in Algorithm 1. The complete pseudocode for other penalties is presented in in Appendix A.
Open Source Code	Yes	Our code is available at https://github.com/chrishzc/Pro Transformer.
Open Datasets	Yes	For topic classification, we use AG s News Corpus (AGNEWS) [51]. For sentiment analysis, we utilize two widely-used datasets: Internet Movie Database (IMDB) [52] and Stanford Sentiment Treebank (SST-2) [53]. For textual entailment, we make use of Recognizing Textual Entailment (RTE) in the General Language Understanding Evaluation benchmark [54]. For jailbreak attack, we select a new dataset Behaviors introduced in [55]. AG s News Corpus (AGNEWS) [51]: It is a collection of more than 1 million news articles.
Dataset Splits	Yes	The CIFAR-10 dataset ... is divided into two parts: 50,000 training images and 10,000 test images. Image Net-1K ... contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
Hardware Specification	No	The paper mentions GPUs in the NeurIPS checklist but does not provide specific models or detailed hardware specifications for the experimental setup, nor does it specify CPU models, memory, or cloud instance types.
Software Dependencies	No	Py Torch is mentioned for implementing Pro Attention (Algorithm 1), but no specific version number is provided. Other tools like 'Text Attack framework [63]' and 'Prompt Bench [15]' are mentioned without version details. This is insufficient for reproducible software dependencies.
Experiment Setup	Yes	For our Pro Transformer, we set the default number of Pro Attention layers as K = 3 since it can quickly converge to a reasonable precision within 3 layers. Finally we tune δ (default 1) or γ (default 4) in the penalties (Huber and MCP loss) to obtain the optimal parameters. For text attack setting, we follow the setting in the Text Attack framework [63]. For prompt attack, we follow the setting in Prompt Bench [15]. For GCG-based jailbreak attack, we follow the setting in [61].