BERT Lost Patience Won't Be Robust to Adversarial Slowdown
Authors: Zachary Coalson, Gabriel Ritter, Rakesh Bobba, Sanghyun Hong
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we systematically evaluate the robustness of multi-exit language models against adversarial slowdown. To audit their robustness, we design a slowdown attack that generates natural adversarial text bypassing early-exit points. We use the resulting WAFFLE attack as a vehicle to conduct a comprehensive evaluation of three multi-exit mechanisms with the GLUE benchmark against adversarial slowdown. |
| Researcher Affiliation | Academia | Zachary Coalson, Gabriel Ritter, Rakesh Bobba, and Sanghyun Hong Oregon State University {coalsonz, ritterg, bobbar, hongsa}@oregonstate.edu |
| Pseudocode | Yes | Algorithm 1 WAFFLE (based on Text Fooler) and Algorithm 2 WAFFLE (based on A2T) |
| Open Source Code | Yes | Our code is available at: https://github.com/ztcoalson/WAFFLE |
| Open Datasets | Yes | Tasks. We evaluate the multi-exit language models trained on seven classification tasks chosen from the GLUE benchmark [36]: RTE, MRPC, MNLI, QNLI, QQP, SST-2, and Co LA. |
| Dataset Splits | Yes | We take the pre-trained language models (i.e., BERT and ALBERT) from Hugging Face7 and fine-tune them on GLUE benchmarks. Our experiments run on a machine equipped with Intel Xeon Processor with 48 cores, 64GB memory and 8 Nvidia A40 GPUs. We fine-tune them on seven different GLUE tasks for five epochs. We choose a batch-size from 32, 64, 128 and a learning rate from 1e-5, 2e-5, 3e-5, 4e-5, 5e-5. We perform hyper-parameter sweeping over all the combinations and select the models that provide the best accuracy for each task. |
| Hardware Specification | Yes | Our experiments run on a machine equipped with Intel Xeon Processor with 48 cores, 64GB memory and 8 Nvidia A40 GPUs. We run our experiments on a single Tesla V100 GPU. |
| Software Dependencies | Yes | We implement all the multi-exit mechanisms and our attacks using Python v3.95 and Py Torch v1.106 that supports CUDA 11.7 for accelerating computations by using GPUs. |
| Experiment Setup | Yes | We choose a batch-size from 32, 64, 128 and a learning rate from 1e-5, 2e-5, 3e-5, 4e-5, 5e-5. We perform hyper-parameter sweeping over all the combinations and select the models that provide the best accuracy for each task. In Dee BERT, we pick the entropy that offers 1.5 computational speedup. In PABEE, we choose the patience value of 6. In Past Future, we set the entropy values where we achieve 2 speedup. |