Contextualized Non-Local Neural Networks for Sequence Learning
Authors: Pengfei Liu, Shuaichen Chang, Xuanjing Huang, Jian Tang, Jackie Chi Kit Cheung6762-6769
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users. |
| Researcher Affiliation | Academia | School of Computer Science, Fudan University, Shanghai Insitute of Intelligent Electronics & Systems MILA & Mc Gill University & The Ohio State University {pfliu14,xjhuang}@fudan.edu.cn, chang.1692@osu.edu,jian.tang@hec.ca,jcheung@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1 Learning Processes of Contextualized Nonlocal Neural Networks for Sequences |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We choose two typical datasets SICK (Marelli et al. 2014) and SNLI (Bowman et al. 2015) for this tasks. Sequence Labelling: We choose POS, Chunking and NER as evaluation tasks on Penn Treebank, Co NLL 2000 and Co NLL 2003 respectively. |
| Dataset Splits | No | The paper states, 'For each task, we take the hyperparameters which achieve the best performance on the development set via grid search.' This implies a validation set (development set), but it does not specify the splits (percentages or counts) for any of the datasets mentioned (QC, SST2, MR, IMDB, SICK, SNLI, POS, Chunking, NER). |
| Hardware Specification | No | The paper does not mention any specific hardware (GPU model, CPU model, memory, etc.) used for the experiments. |
| Software Dependencies | No | The paper mentions 'stochastic gradient descent with the diagonal variant of Ada Delta (Zeiler 2012)' and 'Glo Ve vectors (Pennington, Socher, and Manning 2014)' and 'Stanford NLP toolkit (Manning et al. 2014)'. While these are software/tools, specific version numbers are not provided for them as a whole. |
| Experiment Setup | Yes | To minimize the objective, we use stochastic gradient descent with the diagonal variant of Ada Delta (Zeiler 2012). The word embeddings for all of the models are initialized with Glo Ve vectors (Pennington, Socher, and Manning 2014). The other parameters are initialized by randomly sampling from a uniform distribution in [ 0.1, 0.1]. For each task, we take the hyperparameters which achieve the best performance on the development set via grid search. |