Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

Authors: W. James Murdoch, Peter J. Liu, Bin Yu

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now describe our empirical validation of CD on the task of sentiment analysis. First, we verify that, on the standard problem of word-level importance scores, CD compares favorably to prior work. Then we examine the behavior of CD for word and phrase level importance in situations involving compositionality, showing that CD is able to capture the composition of phrases of differing sentiment. Finally, we show that CD is capable of extracting instances of positive and negative negation. Code for computing CD scores is available online 1.
Researcher Affiliation Collaboration W. James Murdoch Department of Statistics University of California, Berkeley jmurdoch@berkeley.edu Peter J. Liu Google Brain Mountain View, CA Bin Yu Department of Statistics Department of EECS University of California, Berkeley
Pseudocode No The paper includes mathematical equations and derivations, but does not contain a dedicated pseudocode or algorithm block.
Open Source Code Yes Code for computing CD scores is available online 1. 1https://github.com/jamie-murdoch/Contextual Decomposition
Open Datasets Yes We trained an LSTM model on the binary version of the Stanford Sentiment Treebank (SST) (Socher et al., 2013), a standard NLP benchmark which consists of movie reviews ranging from 2 to 52 words long. ... Originally introduced in Zhang et al. (2015), the Yelp review polarity dataset was obtained from the Yelp Dataset Challenge and has train and test sets of sizes 560,000 and 38,000.
Dataset Splits Yes All models were optimized using Adam (Kingma & Ba, 2014) with the default learning rate of 0.001 using early stopping on the validation set. ... the Yelp review polarity dataset ... has train and test sets of sizes 560,000 and 38,000.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'Torch' and 'Adam' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We implemented all models in Torch using default hyperparameters for weight initializations. All models were optimized using Adam (Kingma & Ba, 2014) with the default learning rate of 0.001 using early stopping on the validation set. ... the word and hidden representations of our LSTM were set to 300 and 168, and word vectors were initialized to pretrained Glove vectors (Pennington et al., 2014).