reproducibilityindex.ai

Learning from Explanations with Neural Execution Tree

Authors: Ziqi Wang*, Yujia Qin*, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort. We conduct experiments on two tasks: relation extraction and aspect-term-level sentiment analysis.
Researcher Affiliation	Collaboration	Ziqi Wang1, Yujia Qin1, Wenxuan Zhou2, Jun Yan2, Qinyuan Ye2, Leonardo Neves3, Zhiyuan Liu1, Xiang Ren2 Tsinghua University1, University of Southern California2, Snap Research3
Pseudocode	Yes	Algorithm 1: Learning on Unlabeled Data with NEx T
Open Source Code	Yes	Code: https://github.com/INK-USC/NEx T
Open Datasets	Yes	For RE we choose two datasets, TACRED (Zhang et al., 2017) and Sem Eval (Hendrickx et al., 2009) in our experiments. For this task we use two customer review datasets, Restaurant and Laptop, which are part of Sem Eval 2014 Task 4.
Dataset Splits	No	The paper discusses training on labeled and unlabeled data and reports F1 scores on the test set, but it does not specify explicit training/validation/test splits (e.g., percentages or counts for each split) needed for reproduction across all datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components like GloVe embeddings and Adagrad optimizer, but does not provide specific version numbers for these or other key libraries/frameworks (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	We use 300-dimensional word embeddings pre-trained by Glo Ve (Pennington et al., 2014). The dropout rate is 0.96 for word embeddings and 0.5 for sentence encoder. The hidden state size of the encoder and attention layer is 300 and 200 respectively. We choose Adagrad as the optimizer and the learning rate for joint model learning is 0.5. For TACRED, we set the learning rate to 0.1 in the pretraining stage. The total epochs for pretraining are 10. The weight for Lsim is set to 0.5. The batch size for pretraining is set to 100. For training the classiﬁer, the batch size for labeled data and unlabeled data is 50 and 100 respectively, the weight α for Lu is set to 0.7, the weight β for Lstring is set to 0.2, the weight γ for Lsim is set to 2.5.