Improving Interpretability via Explicit Word Interaction Graph Layer
Authors: Arshdeep Sekhon, Hanjie Chen, Aman Shrivastava, Zhe Wang, Yangfeng Ji, Yanjun Qi
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design experiments to answer the following: 1. Are NLP models augmented with WIGRAPH layer more interpretable models? 2. Do NLP models augmented with WIGRAPH layer predict well? Besides, we extend WIGRAPH to one concept based vision task in Section 4.5. 4.1 Setup: Datasets, Models And Metrics Datasets Our empirical analysis covers six popular text classification datasets as detailed by Table 3. |
| Researcher Affiliation | Academia | University of Virginia, Charlottesville, USA as5cu@virginia.edu, hc9mx@virginia.edu, as3ek@virginia.edu, zw6sg@virginia.edu, yj3fs@virginia.edu, yanjun@virginia.edu |
| Pseudocode | No | The paper describes the WIGRAPH layer and its operations using mathematical equations and text (e.g., Section 2 and its subsections), but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | Our empirical analysis covers six popular text classification datasets as detailed by Table 3. These six datasets are "sst1", "sst2"(Socher et al. 2013), "imdb"(Maas et al. 2011), "AG news"(Zhang, Zhao, and Le Cun 2015), "TREC"(Li and Roth 2002) and "Subj"(Pang and Lee 2005). |
| Dataset Splits | Yes | Table 3: Summary of datasets we use in experiments: Dataset Train/Dev/Test C V L sst1 8544/1101/2210 5 17838 50 sst2 6920/872/1821 2 16190 50 imdb 20K/5K/25K 2 29571 250 AG News 114K/6K/7.6K 4 21838 50 TREC 5000/452/500 6 8026 15 Subj 8000/1000/1000 2 9965 25 |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments. It only mentions the types of models used (LSTM, BERT, RoBERTa, distilBERT). |
| Software Dependencies | No | The paper mentions various models and techniques used (e.g., LSTM, BERT, RoBERTa, distilBERT, Gumbel-Softmax, word embeddings from Mikolov et al.), but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | Hyperparameter Tuning We perform fine-tuning on each model (batch size=64). We fix the word embedding layer and train WIGRAPH layer along with the rest of a BASE model. For the LSTM models, we vary the hidden size {100, 300, 500}, and dropout in {0.0, 0.2, 0.3}. We set βsparse {1e 02, 1e 03, 1e 04}, βg {1.0, 1e 02, 1e 03, 1e 04} and βi {1.0, 1e 02, 1e 03, 1e 04}. The learning rate is tuned from the set {0.0001, 0.0005, 0.005, 0.001}. For transformer based models, we vary dropout in range {0.2, 0.3, 0.5}, hidden dimension to compute R {128, 256, 512}. We set βsparse, βg, βi = 1.0 and anneal it by a factor of 0.1 every epoch. |