reproducibilityindex.ai

Contextualized Non-Local Neural Networks for Sequence Learning

Authors: Pengfei Liu, Shuaichen Chang, Xuanjing Huang, Jian Tang, Jackie Chi Kit Cheung6762-6769

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on ten NLP tasks in text classiﬁcation, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-speciﬁc dependency structures, thus providing better interpretability to users.
Researcher Affiliation	Academia	School of Computer Science, Fudan University, Shanghai Insitute of Intelligent Electronics & Systems MILA & Mc Gill University & The Ohio State University {pﬂiu14,xjhuang}@fudan.edu.cn, chang.1692@osu.edu,jian.tang@hec.ca,jcheung@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1 Learning Processes of Contextualized Nonlocal Neural Networks for Sequences
Open Source Code	No	The paper does not contain any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We choose two typical datasets SICK (Marelli et al. 2014) and SNLI (Bowman et al. 2015) for this tasks. Sequence Labelling: We choose POS, Chunking and NER as evaluation tasks on Penn Treebank, Co NLL 2000 and Co NLL 2003 respectively.
Dataset Splits	No	The paper states, 'For each task, we take the hyperparameters which achieve the best performance on the development set via grid search.' This implies a validation set (development set), but it does not specify the splits (percentages or counts) for any of the datasets mentioned (QC, SST2, MR, IMDB, SICK, SNLI, POS, Chunking, NER).
Hardware Specification	No	The paper does not mention any specific hardware (GPU model, CPU model, memory, etc.) used for the experiments.
Software Dependencies	No	The paper mentions 'stochastic gradient descent with the diagonal variant of Ada Delta (Zeiler 2012)' and 'Glo Ve vectors (Pennington, Socher, and Manning 2014)' and 'Stanford NLP toolkit (Manning et al. 2014)'. While these are software/tools, specific version numbers are not provided for them as a whole.
Experiment Setup	Yes	To minimize the objective, we use stochastic gradient descent with the diagonal variant of Ada Delta (Zeiler 2012). The word embeddings for all of the models are initialized with Glo Ve vectors (Pennington, Socher, and Manning 2014). The other parameters are initialized by randomly sampling from a uniform distribution in [ 0.1, 0.1]. For each task, we take the hyperparameters which achieve the best performance on the development set via grid search.