Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs
Authors: W. James Murdoch, Peter J. Liu, Bin Yu
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now describe our empirical validation of CD on the task of sentiment analysis. First, we verify that, on the standard problem of word-level importance scores, CD compares favorably to prior work. Then we examine the behavior of CD for word and phrase level importance in situations involving compositionality, showing that CD is able to capture the composition of phrases of differing sentiment. Finally, we show that CD is capable of extracting instances of positive and negative negation. Code for computing CD scores is available online 1. |
| Researcher Affiliation | Collaboration | W. James Murdoch Department of Statistics University of California, Berkeley jmurdoch@berkeley.edu Peter J. Liu Google Brain Mountain View, CA Bin Yu Department of Statistics Department of EECS University of California, Berkeley |
| Pseudocode | No | The paper includes mathematical equations and derivations, but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code for computing CD scores is available online 1. 1https://github.com/jamie-murdoch/Contextual Decomposition |
| Open Datasets | Yes | We trained an LSTM model on the binary version of the Stanford Sentiment Treebank (SST) (Socher et al., 2013), a standard NLP benchmark which consists of movie reviews ranging from 2 to 52 words long. ... Originally introduced in Zhang et al. (2015), the Yelp review polarity dataset was obtained from the Yelp Dataset Challenge and has train and test sets of sizes 560,000 and 38,000. |
| Dataset Splits | Yes | All models were optimized using Adam (Kingma & Ba, 2014) with the default learning rate of 0.001 using early stopping on the validation set. ... the Yelp review polarity dataset ... has train and test sets of sizes 560,000 and 38,000. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Torch' and 'Adam' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We implemented all models in Torch using default hyperparameters for weight initializations. All models were optimized using Adam (Kingma & Ba, 2014) with the default learning rate of 0.001 using early stopping on the validation set. ... the word and hidden representations of our LSTM were set to 300 and 168, and word vectors were initialized to pretrained Glove vectors (Pennington et al., 2014). |