Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Authors: Wangchunshu Zhou, Jinyi Hu, Hanlin Zhang, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two natural language understanding tasks demonstrate that our framework can not only make effective predictions in both supervised and semi-supervised settings, but also generate good natural language explanations.
Researcher Affiliation Collaboration 1 Beihang University 2 Tsinghua University 3 South China University of Technology 4 Sun Yat-sen University 5 Microsoft Research 6 Mila-Québec AI Institute 7 HEC Montréal
Pseudocode Yes Algorithm 1: Explanation-based Self-Training (ELV-EST)
Open Source Code Yes Code is available at https://github.com/James Hujy/ELV.git
Open Datasets Yes We conduct experiments on two tasks: relation extraction (RE) and aspect-based sentiment classification (ASC). For relation extraction we choose two datasets, TACRED [23] and Sem Eval [21] in our experiments. We use two customer review datasets, Restaurant and Laptop, which are part of Sem Eval 2014 Task 4 [24] for the aspect-based sentiment classification task.
Dataset Splits Yes Table 1: Statistics of datasets. We present the size of train/dev/test sets for 4 datasets in both supervised and semi-supervised settings. Moreover, # Exp means the size of initial explanation sets. ... Sem Eval [21] 203 7,016 1,210 800 2,715
Hardware Specification No The paper mentions using 'BERT-base and Uni LM-base as the backbone of our prediction model and explanation generation model, respectively.' but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using 'BERT-base' and 'Uni LM-base' as backbone models, 'Sentence BERT [19]' for embeddings, and 'Adam optimizers'. However, it does not provide specific version numbers for these or other software libraries/frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We select batch size over {32, 64} and learning rate over {1e-5, 2e-5, 3e-5}. The number of retrieved explanations is set to 10 for all tasks. We train the prediction model for 3 epochs and the generation model for 5 epochs in each EM iteration. We use Adam optimizers and early stopping with the best validation F1-score.