Multi-Task Recurrent Modular Networks
Authors: Dongkuan Xu, Wei Cheng, Xin Dong, Bo Zong, Wenchao Yu, Jingchao Ni, Dongjin Song, Xuchao Zhang, Haifeng Chen, Xiang Zhang10496-10504
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on three multi-task sequence processing datasets consistently demonstrate the effectiveness of MT-RMN. Experiments In our experiments, all the models are trained by Adam (Kingma and Ba 2014). The reported results were got by 5 times 5-fold cross validation. To make fair comparisons, we tuned the baselines as much as we could. |
| Researcher Affiliation | Collaboration | Dongkuan Xu1, Wei Cheng2, Xin Dong3, Bo Zong2, Wenchao Yu2, Jingchao Ni2, Dongjin Song2, Xuchao Zhang2, Haifeng Chen2, Xiang Zhang1 1The Pennsylvania State University 2NEC Laboratories America, Inc. 3Rutgers University |
| Pseudocode | No | The paper describes the architecture and mathematical formulations of the proposed method (e.g., equations for hidden states, policy networks, input/output calculations). However, it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Data and codes can be found in the authors website. |
| Open Datasets | Yes | We use the Hindi-English code-switched dataset provided in (Patra, Das, and Das 2018) for the primary task. The dataset has 19 POS tags and contains 2102 and 528 instances for the training and test sets respectively. We use the Hindi POS tagging dataset provided in (Sachdeva et al. 2014) and the English one provided in (Sang and Buchholz 2000) for the auxiliary tasks of Hindi and English respectively. |
| Dataset Splits | Yes | The reported results were got by 5 times 5-fold cross validation. |
| Hardware Specification | No | The paper mentions training models and running experiments but provides no specific details about the hardware used (e.g., GPU/CPU models, memory, or specific computing environments). |
| Software Dependencies | No | The paper mentions that “all the models are trained by Adam (Kingma and Ba 2014)” which is an optimizer. However, it does not provide specific version numbers for any programming languages, libraries, frameworks (e.g., Python, PyTorch, TensorFlow), or other key software components used for replication. |
| Experiment Setup | Yes | The learning rate was set to 10 3 initially and decreased during the training. We experimented with the values of k, λ1, λ2, γ1 , γ2 in the sets {2, 3, 4}, {2 2, 2 1, 20, 21, 22 }, {2 2, 2 1, 20, 21, 22 }, {0.25, 0.5, 0.75, 1}, {0.25, 0.5, 0.75, 1} respectively. We obtained best results using k=3, λ1=1, λ1=1, γ1=0.75, γ2=0.75 in general. For the temperature τ used in the Gumbel-Max trick, we set its value as 100 initially and divided the value by 2 at each epoch. |