Self-Instantiated Recurrent Units with Dynamic Soft Recursion

Authors: Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan, SHUAI ZHANG

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the Self-IRU on a wide spectrum of sequence modeling tasks across multiple modalities: logical inference, sorting, tree traversal, music modeling, semantic parsing, code generation, and pixel-wise sequential image classification. Overall, the empirical results demonstrate architectural flexibility and effectiveness of the Self-IRU.
Researcher Affiliation Collaboration Amazon Web Services AI, Google Research Mila, Université de Montréal, NTU, Singapore, ETH Zürich
Pseudocode No The paper includes mathematical equations and a model architecture diagram, but no structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'We run our experiments on the publicly released source code3 of [Yin and Neubig, 2018]', referring to a baseline's code. It does not explicitly state that the authors' own Self-IRU implementation code is open-source or provide a link to it.
Open Datasets Yes We use the well-established pixel-wise MNIST and CIFAR-10 datasets. We experiment for the logical inference task on the standard dataset2 proposed by Bowman et al. [2014]. We use three well-established datasets: Nottingham, JSB Chorales, and Piano Midi [Boulanger-Lewandowski et al., 2012].
Dataset Splits No For logical inference, the paper states the model is trained on sequences with 6 or fewer operations and evaluated on sequences of 6 to 12 operations, indicating a train/test split. However, it does not explicitly specify a validation set or detailed split percentages for train/validation/test across all experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions running experiments on the 'publicly released source code3 of [Yin and Neubig, 2018]' and following its hyperparameter details. However, it does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) within the paper.
Experiment Setup Yes Table 7 reports their optimal combinations for diverse tasks in the experiments, where the maximum recursion depth is evaluated on L = {0, 1, 2, 3}.