Flexible Instance-Specific Rationalization of NLP Models
Authors: George Chrysostomou, Nikolaos Aletras10545-10553
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on four standard text classification datasets shows that our proposed method provides more faithful, comprehensive and highly sufficient explanations compared to using a fixed feature scoring method, rationale length and type. 4 Experimental Setup Tasks For our experiments we use the following datasets (details in Table 1): |
| Researcher Affiliation | Academia | George Chrysostomou, Nikolaos Aletras Department of Computer Science, University of Sheffield gchrysostomou1@sheffield.ac.uk, n.aletras@sheffield.ac.uk |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code for experiments available at: https://github.com/GChrysostomou/instance-specific-rationale |
| Open Datasets | Yes | For our experiments we use the following datasets (details in Table 1): SST: Binary sentiment classification without neutral sentences (Socher et al. 2013). AG: News articles categorized in Science, Sports, Business, and World topics (Corso, Gulli, and Romani 2005). Evidence Inference (EV.INF.): Abstract-only biomedical articles describing randomized controlled trials. ... (Lehman et al. 2019). Multi RC (M.RC): A reading comprehension task... (Khashabi et al. 2018). |
| Dataset Splits | Yes | Data |W| C Splits Train/Dev/Test F1 N SST 18 2 6,920 / 872 / 1,821 90.1 0.2 20% AG 36 4 102,000 / 18,000 / 7,600 93.5 0.2 20% Ev.Inf. 363 3 5,789 / 684 / 720 83.0 1.6 10% M.RC 305 2 24,029 / 3,214 / 4,848 73.2 1.7 20% |
| Hardware Specification | No | No specific details about the hardware (e.g., GPU model, CPU type, memory) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were explicitly mentioned. |
| Experiment Setup | Yes | For our work, we use a 2% skip rate which led to a seven-fold reduction in the time required to compute rationales for datasets comprising of long sequences, such as MRc and Ev Inf, with comparable performance in faithfulness to the slower process of removing one token at a time. We set N as the upper bound rationale length for our approach to make results comparable with fixed length rationales. |