Long Short-Term Memory with Dynamic Skip Connections
Authors: Tao Gui, Qi Zhang, Lujun Zhao, Yaosong Lin, Minlong Peng, Jingjing Gong, Xuanjing Huang6481-6488
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. |
| Researcher Affiliation | Academia | Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University Shanghai Insitute of Intelligent Electroics & Systems 825 Zhangheng Road, Shanghai, China |
| Pseudocode | No | The paper describes its model and algorithms using mathematical equations and text, but it does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide a link to its source code or explicitly state that the code for the described methodology is open-source or publicly available. Footnote 1 refers to the arXiv preprint of the paper itself. |
| Open Datasets | Yes | The datasets used in the experiments are listed in Table 1. ... Co NLL2003 shared task (Tjong Kim Sang and De Meulder 2003). ... Penn Treebank language model corpus (Marcus, Marcinkiewicz, and Santorini 1993). ... IMDB dataset (Maas et al. 2011). |
| Dataset Splits | Yes | Table 1 lists #Dev (development set/validation set) counts for Co NLL2003, Penn Treebank, and Number Prediction datasets. For IMDB: "We randomly set aside about 15% of the training data for validation." |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions training times (e.g., "training required 9.98 hours"). |
| Software Dependencies | No | The paper mentions the use of an "Adam optimizer (Kingma and Ba 2014)" but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | General experiment settings. For the fair comparison, we use the same hyperparameters and optimizer with each baseline model of different tasks, which will be detailed in each experiment. ... NER: "λ = 1, K = 5". LM: "two layers of LSTM with 650 units, and the weights are initialized uniformly [-0.05, +0.05]. The gradients backpropagate for 35 time steps using stochastic gradient descent, with a learning rate initially set to 1.0. The norm of the gradients is constrained to be below five. Dropout with a probability of 0.5". Sentiment Analysis: "one layer and 128 hidden units, and the batch size is 50. ... Dropout with a rate of 0.2 ... We set λ and K to 0.5 and 3, respectively.". Number Prediction: "The Adam optimizer (Kingma and Ba 2014) trained with cross-entropy loss is used with 0.001 as the default learning rate." |