Self-Instantiated Recurrent Units with Dynamic Soft Recursion
Authors: Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan, SHUAI ZHANG
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the Self-IRU on a wide spectrum of sequence modeling tasks across multiple modalities: logical inference, sorting, tree traversal, music modeling, semantic parsing, code generation, and pixel-wise sequential image classification. Overall, the empirical results demonstrate architectural flexibility and effectiveness of the Self-IRU. |
| Researcher Affiliation | Collaboration | Amazon Web Services AI, Google Research Mila, Université de Montréal, NTU, Singapore, ETH Zürich |
| Pseudocode | No | The paper includes mathematical equations and a model architecture diagram, but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We run our experiments on the publicly released source code3 of [Yin and Neubig, 2018]', referring to a baseline's code. It does not explicitly state that the authors' own Self-IRU implementation code is open-source or provide a link to it. |
| Open Datasets | Yes | We use the well-established pixel-wise MNIST and CIFAR-10 datasets. We experiment for the logical inference task on the standard dataset2 proposed by Bowman et al. [2014]. We use three well-established datasets: Nottingham, JSB Chorales, and Piano Midi [Boulanger-Lewandowski et al., 2012]. |
| Dataset Splits | No | For logical inference, the paper states the model is trained on sequences with 6 or fewer operations and evaluated on sequences of 6 to 12 operations, indicating a train/test split. However, it does not explicitly specify a validation set or detailed split percentages for train/validation/test across all experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions running experiments on the 'publicly released source code3 of [Yin and Neubig, 2018]' and following its hyperparameter details. However, it does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) within the paper. |
| Experiment Setup | Yes | Table 7 reports their optimal combinations for diverse tasks in the experiments, where the maximum recursion depth is evaluated on L = {0, 1, 2, 3}. |