Multiple-Weight Recurrent Neural Networks
Authors: Zhu Cao, Linlin Wang, Gerard de Melo
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our detailed experimental results show that our model outperforms previous work across a range of different tasks and datasets. |
| Researcher Affiliation | Academia | Zhu Cao1 , Linlin Wang1 and Gerard de Melo2 1 IIIS, Tsinghua University, Beijing, China 2 Rutgers University, New Brunswick, NJ, USA {cao-z13, ll-wang13}@mails.tsinghua.edu.cn, gdm@demelo.org |
| Pseudocode | No | The paper provides equations and diagrams of model structures but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use the standard Penn Treebank dataset [Marcus et al., 1993] for language modeling. We use the Who-did-What dataset [Onishi et al., 2016], which consists of around 0.2 million cloze questions. Additionally, we evaluate our model on two further datasets: (1) Restaurant Reservation Dialogs [Eric and Manning, 2017] and (2) the Switchboard Dialog Act Corpus [Liu and Lane, 2017]. |
| Dataset Splits | Yes | The dataset is split into training set, test set, and validation set, with 1,618, 1,117, and 500 dialogues, respectively. In the second (Switchboard) dataset, there are part-of-speech (POS) tags associated with the utterances. The data is split into 14 equal parts, 10 of which are used for training, one for testing, and the remaining three for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions). |
| Experiment Setup | No | The paper mentions using 'mini-batch stochastic gradient descent' and that 'The learning rate is controlled by Ada Delta [Zeiler, 2012]', and 'beam search' for testing, but it does not provide specific hyperparameter values like learning rate, batch size, or number of epochs. |