Multi-Zone Unit for Recurrent Neural Networks
Authors: Fandong Meng, Jinchao Zhang, Yang Liu, Jie Zhou5150-5157
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on multiple datasets of the character-level language modeling task and the aspect-based sentiment analysis task demonstrate the superiority of the MZU. |
| Researcher Affiliation | Collaboration | Fandong Meng,1 Jinchao Zhang,1 Yang Liu,2 Jie Zhou1 1We Chat AI Pattern Recognition Center Tencent Inc., China 2Department of Computer Science and Technology, Tsinghua University, Beijing, China |
| Pseudocode | No | The paper describes the model architecture and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We use two standard datasets, namely the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993) and the larger Wikipedia dataset (text8) (Mahoney 2011); and 2) the aspect-based sentiment analysis (ABSA) task with two datasets of the Sem Eval 2014 Task 4 (Pontiki et al. 2014) in different domains. |
| Dataset Splits | Yes | We follow the process procedure introduced in (Mikolov et al. 2012), and split the data into training, validation and test sets consisting of 5.0M, 390K and 440K characters, respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of Adam optimizer but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We train the model using Adam (Kingma and Ba 2014) with an initial learning rate of 0.001. Each update is done by using a mini-batch of 256 examples. ... The dropout rate for the Penn Treebank task and the text8 task are set to 0.5 and 0.3, respectively. The hidden size and filter size for Penn Treebank are set to 800 and 1000, respectively. And those for text8 are set to 1,536 and 3,072, respectively. The embedding size is 256. ... We set λ to 1.0 in Eq. 19 for all experiments. |