Learning from Explanations with Neural Execution Tree

Authors: Ziqi Wang*, Yujia Qin*, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort. We conduct experiments on two tasks: relation extraction and aspect-term-level sentiment analysis.
Researcher Affiliation Collaboration Ziqi Wang1, Yujia Qin1, Wenxuan Zhou2, Jun Yan2, Qinyuan Ye2, Leonardo Neves3, Zhiyuan Liu1, Xiang Ren2 Tsinghua University1, University of Southern California2, Snap Research3
Pseudocode Yes Algorithm 1: Learning on Unlabeled Data with NEx T
Open Source Code Yes Code: https://github.com/INK-USC/NEx T
Open Datasets Yes For RE we choose two datasets, TACRED (Zhang et al., 2017) and Sem Eval (Hendrickx et al., 2009) in our experiments. For this task we use two customer review datasets, Restaurant and Laptop, which are part of Sem Eval 2014 Task 4.
Dataset Splits No The paper discusses training on labeled and unlabeled data and reports F1 scores on the test set, but it does not specify explicit training/validation/test splits (e.g., percentages or counts for each split) needed for reproduction across all datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components like GloVe embeddings and Adagrad optimizer, but does not provide specific version numbers for these or other key libraries/frameworks (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes We use 300-dimensional word embeddings pre-trained by Glo Ve (Pennington et al., 2014). The dropout rate is 0.96 for word embeddings and 0.5 for sentence encoder. The hidden state size of the encoder and attention layer is 300 and 200 respectively. We choose Adagrad as the optimizer and the learning rate for joint model learning is 0.5. For TACRED, we set the learning rate to 0.1 in the pretraining stage. The total epochs for pretraining are 10. The weight for Lsim is set to 0.5. The batch size for pretraining is set to 100. For training the classifier, the batch size for labeled data and unlabeled data is 50 and 100 respectively, the weight α for Lu is set to 0.7, the weight β for Lstring is set to 0.2, the weight γ for Lsim is set to 2.5.