Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AttExplainer: Explain Transformer via Attention by Reinforcement Learning
Authors: Runliang Niu, Zhepei Wei, Yan Wang, Qi Wang
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University 2Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University |
| Pseudocode | Yes | Algorithm 1 DQN training progress |
| Open Source Code | Yes | The code of this paper is available at https://github.com/niuzaisheng/AttExplainer. |
| Open Datasets | Yes | Datasets. We used two types of text classification settings: single sentence classification and Natural Language Inference (NLI). Specifically, single sentence classification datasets include the Emotion dataset [Saravia et al., 2018] and the Stanford Sentiment Treebank (SST2) dataset [Wang and others, 2019]. The NLI dataset we used is the SNLI corpus [Bowman and others, 2015]. The details of these datasets are presented in Table 1. |
| Dataset Splits | No | Table 1 lists '# Train' and '# Test' for each dataset, indicating training and testing splits, but there is no explicit mention of a validation split with specific numbers or percentages for reproducibility. |
| Hardware Specification | Yes | We trained our model on multiple Titan RTX graphics cards. |
| Software Dependencies | No | The paper mentions software like Huggingface, Captum toolkit, and Open Attack, but does not provide specific version numbers for any of these tools or other key software dependencies. |
| Experiment Setup | Yes | The max game step is limited to 100. Max size of replacing buffer for PER is 100,000. The Adam optimizer [Kingma and Ba, 2015] was used for training the DQN model at a fixed learning rate 10^-4, training batch size is 256. We set α = 10, β = 0.2, ϵ = 0.7, γ = 0.9. The number of feature bins b is fixed at 32. The parameters of the DQN target net are replaced by the eval net every 100 learning steps. |