reproducibilityindex.ai

TrojText: Test-time Invisible Textual Trojan Insertion

Authors: Qian Lou, Yepeng Liu, Bo Feng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The Troj Text approach was evaluated on three datasets (AG s News, SST-2, and OLID) using three NLP models (BERT, XLNet, and De BERTa). The experiments demonstrated that the Troj Text approach achieved a 98.35% classification accuracy for test sentences in the target class on the BERT model for the AG s News dataset.
Researcher Affiliation	Collaboration	Qian Lou University of Central Florida qian.lou@ucf.edu Yepeng Liu University of Central Florida yepeng.liu@knights.ucf.edu Bo Feng Meta Platforms, Inc., AI Infra bfeng@meta.com
Pseudocode	Yes	Algorithm 1 Pseudocode of Trojan Weights Pruning in Troj Text
Open Source Code	Yes	The source code for Troj Text is available at https://github.com/UCF-ML-Research/Troj Text.
Open Datasets	Yes	We evaluate the effects of our proposed Troj Text attack on three textual tasks whose datasets are AG s News (Zhang et al., 2015), Stanford Sentiment Treebank (SST-2) (Socher et al., 2013) and Offensive Language Identification Dataset (OLID) (Zampieri et al., 2019).
Dataset Splits	Yes	We use validation datasets to train the target model and test the poisoned model on the test dataset. The details of these datasets are presented in Table 1.
Hardware Specification	No	The paper discusses NLP models but does not provide specific details on the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	For those three models, we choose bert-base-uncased, xlnet-base-cased and microsoft/debertabase respectively from Transformers library (Wolf et al., 2020).
Experiment Setup	Yes	For hyperparameter of loss function, in our experiment, we set λ = 0.5, λL = 0.5 and λR = 0.5. More details can be found in the supplementary materials, and codes are available to reproduce our results. [...] we use the Neural Gradient Ranking (NGR) method in TBT to identify top 500 most important weights in last layer of the target model and apply the Logit loss function presented in equation 1 to do backdoor training. [...] We study the effects of threshold e in section 5.