reproducibilityindex.ai

BERT Lost Patience Won't Be Robust to Adversarial Slowdown

Authors: Zachary Coalson, Gabriel Ritter, Rakesh Bobba, Sanghyun Hong

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we systematically evaluate the robustness of multi-exit language models against adversarial slowdown. To audit their robustness, we design a slowdown attack that generates natural adversarial text bypassing early-exit points. We use the resulting WAFFLE attack as a vehicle to conduct a comprehensive evaluation of three multi-exit mechanisms with the GLUE benchmark against adversarial slowdown.
Researcher Affiliation	Academia	Zachary Coalson, Gabriel Ritter, Rakesh Bobba, and Sanghyun Hong Oregon State University {coalsonz, ritterg, bobbar, hongsa}@oregonstate.edu
Pseudocode	Yes	Algorithm 1 WAFFLE (based on Text Fooler) and Algorithm 2 WAFFLE (based on A2T)
Open Source Code	Yes	Our code is available at: https://github.com/ztcoalson/WAFFLE
Open Datasets	Yes	Tasks. We evaluate the multi-exit language models trained on seven classiﬁcation tasks chosen from the GLUE benchmark [36]: RTE, MRPC, MNLI, QNLI, QQP, SST-2, and Co LA.
Dataset Splits	Yes	We take the pre-trained language models (i.e., BERT and ALBERT) from Hugging Face7 and ﬁne-tune them on GLUE benchmarks. Our experiments run on a machine equipped with Intel Xeon Processor with 48 cores, 64GB memory and 8 Nvidia A40 GPUs. We ﬁne-tune them on seven different GLUE tasks for ﬁve epochs. We choose a batch-size from 32, 64, 128 and a learning rate from 1e-5, 2e-5, 3e-5, 4e-5, 5e-5. We perform hyper-parameter sweeping over all the combinations and select the models that provide the best accuracy for each task.
Hardware Specification	Yes	Our experiments run on a machine equipped with Intel Xeon Processor with 48 cores, 64GB memory and 8 Nvidia A40 GPUs. We run our experiments on a single Tesla V100 GPU.
Software Dependencies	Yes	We implement all the multi-exit mechanisms and our attacks using Python v3.95 and Py Torch v1.106 that supports CUDA 11.7 for accelerating computations by using GPUs.
Experiment Setup	Yes	We choose a batch-size from 32, 64, 128 and a learning rate from 1e-5, 2e-5, 3e-5, 4e-5, 5e-5. We perform hyper-parameter sweeping over all the combinations and select the models that provide the best accuracy for each task. In Dee BERT, we pick the entropy that offers 1.5 computational speedup. In PABEE, we choose the patience value of 6. In Past Future, we set the entropy values where we achieve 2 speedup.