reproducibilityindex.ai

TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization

Authors: Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia Liu, Shiyu Chang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are provided to demonstrate the effectiveness of TEXTGRAD not only in attack generation for robustness evaluation but also in adversarial defense.
Researcher Affiliation	Collaboration	1UC Santa Barbara, 2Michigan State University, 3MIT-IBM Watson AI Lab
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	Codes are available at https://github.com/UCSB-NLP-Chang/TextGrad
Open Datasets	Yes	SST-2 (Socher et al., 2013) for sentiment analysis, MNLI (Williams et al., 2018), RTE (Wang et al., 2018), and QNLI (Wang et al., 2018) for natural language inference and AG News (Zhang et al., 2015) for text classification.
Dataset Splits	Yes	For datasets that the labels of testing dataset are not available (MNLI, RTE, QNLI), we randomly sample 10% of training dataset as validation dataset and use the original validation dataset for testing. For the AG News dataset where the validation set is not available, we use the same way to generate the validation dataset.
Hardware Specification	Yes	We run our experiments on the Tesla V100 GPU with 16GB memory.
Software Dependencies	No	No specific version numbers for general software dependencies like Python, PyTorch, or CUDA were explicitly mentioned.
Experiment Setup	Yes	We fine-tune the pre-trained BERT-base-uncased model on each dataset with a batch size of 32, a learning rate of 2e-5 for 5 epochs. For RoBERTa-large and ALBERT-xxlargev2, we use a batch size of 16 and learning rate of 1e-5. ... Regarding the hyper-parameters of TEXTGRAD, we utilize 20-step PGD for optimization and fix the number of sampling R in each iteration to be 20. We adopt a learning rate of 0.8 for both z and u, and normalize the gradient g1,t and g2,t to unit norm before the descent step.