TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization

Authors: Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia Liu, Shiyu Chang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are provided to demonstrate the effectiveness of TEXTGRAD not only in attack generation for robustness evaluation but also in adversarial defense.
Researcher Affiliation Collaboration 1UC Santa Barbara, 2Michigan State University, 3MIT-IBM Watson AI Lab
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes Codes are available at https://github.com/UCSB-NLP-Chang/TextGrad
Open Datasets Yes SST-2 (Socher et al., 2013) for sentiment analysis, MNLI (Williams et al., 2018), RTE (Wang et al., 2018), and QNLI (Wang et al., 2018) for natural language inference and AG News (Zhang et al., 2015) for text classification.
Dataset Splits Yes For datasets that the labels of testing dataset are not available (MNLI, RTE, QNLI), we randomly sample 10% of training dataset as validation dataset and use the original validation dataset for testing. For the AG News dataset where the validation set is not available, we use the same way to generate the validation dataset.
Hardware Specification Yes We run our experiments on the Tesla V100 GPU with 16GB memory.
Software Dependencies No No specific version numbers for general software dependencies like Python, PyTorch, or CUDA were explicitly mentioned.
Experiment Setup Yes We fine-tune the pre-trained BERT-base-uncased model on each dataset with a batch size of 32, a learning rate of 2e-5 for 5 epochs. For RoBERTa-large and ALBERT-xxlargev2, we use a batch size of 16 and learning rate of 1e-5. ... Regarding the hyper-parameters of TEXTGRAD, we utilize 20-step PGD for optimization and fix the number of sampling R in each iteration to be 20. We adopt a learning rate of 0.8 for both z and u, and normalize the gradient g1,t and g2,t to unit norm before the descent step.