AttExplainer: Explain Transformer via Attention by Reinforcement Learning

Authors: Runliang Niu, Zhepei Wei, Yan Wang, Qi Wang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines.
Researcher Affiliation Academia 1School of Artificial Intelligence, Jilin University 2Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University
Pseudocode Yes Algorithm 1 DQN training progress
Open Source Code Yes The code of this paper is available at https://github.com/niuzaisheng/AttExplainer.
Open Datasets Yes Datasets. We used two types of text classification settings: single sentence classification and Natural Language Inference (NLI). Specifically, single sentence classification datasets include the Emotion dataset [Saravia et al., 2018] and the Stanford Sentiment Treebank (SST2) dataset [Wang and others, 2019]. The NLI dataset we used is the SNLI corpus [Bowman and others, 2015]. The details of these datasets are presented in Table 1.
Dataset Splits No Table 1 lists '# Train' and '# Test' for each dataset, indicating training and testing splits, but there is no explicit mention of a validation split with specific numbers or percentages for reproducibility.
Hardware Specification Yes We trained our model on multiple Titan RTX graphics cards.
Software Dependencies No The paper mentions software like Huggingface, Captum toolkit, and Open Attack, but does not provide specific version numbers for any of these tools or other key software dependencies.
Experiment Setup Yes The max game step is limited to 100. Max size of replacing buffer for PER is 100,000. The Adam optimizer [Kingma and Ba, 2015] was used for training the DQN model at a fixed learning rate 10^-4, training batch size is 256. We set α = 10, β = 0.2, ϵ = 0.7, γ = 0.9. The number of feature bins b is fixed at 32. The parameters of the DQN target net are replaced by the eval net every 100 learning steps.