Modeling Local Dependence in Natural Language with Multi-Channel Recurrent Neural Networks
Authors: Chang Xu, Weiran Huang, Hongwei Wang, Gang Wang, Tie-Yan Liu5525-5532
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems. |
| Researcher Affiliation | Collaboration | Chang Xu,1 Weiran Huang,2 Hongwei Wang,3 Gang Wang,4 Tie-Yan Liu5 1,4College of Computer Science, Nankai University, {changxu, wgzwp}@nbjl.nankai.edu.cn 2Tsinghua University, huang.inbox@outlook.com 3Shanghai Jiao Tong University, wanghongwei55@gmail.com 5Microsoft Research, tie-yan.liu@microsoft.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, such as a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | The data we use is the German English (De-En for short) machine translation track of the IWSLT 2014 evaluation campaign (Cettolo et al. 2014)... The dataset we use is Gigaword corpus (Graff et al. 2003)... We conduct our experiments on the Penn Treebank corpus which contains about 1 million words (Mikolov et al. 2010) |
| Dataset Splits | Yes | The training/dev/test dataset respectively contains about 153k/7k/7k De-En sentences pairs... resulting in 3.8M training article-headline pairs, 190k for validation and 2000 for test. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts) used for running its experiments, only general statements about RNNs. |
| Software Dependencies | No | The paper mentions techniques like LSTM and BPE, and refers to a model (AWD-LSTM) typically implemented in specific frameworks, but it does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA). |
| Experiment Setup | Yes | The encoders and decoders of our model and Baseline-RNN are all equipped with 2-layer LSTM with word embedding size 256 and hidden state size 256... During training, we automatically halve the learning rate according to validation performance on dev set and stop when the performance is not improved any more... The mini-batch size is 64 and the learning rate is halved when the dev performance stops increasing... using a stacked three-layer LSTM model, with 1150 units in the hidden layer and 400-dimensional word embeddings. Drop Connect is used on the hidden-to-hidden weight matrices. |