Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Instantiated Recurrent Units with Dynamic Soft Recursion
Authors: Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan, SHUAI ZHANG
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the Self-IRU on a wide spectrum of sequence modeling tasks across multiple modalities: logical inference, sorting, tree traversal, music modeling, semantic parsing, code generation, and pixel-wise sequential image classification. Overall, the empirical results demonstrate architectural flexibility and effectiveness of the Self-IRU. |
| Researcher Affiliation | Collaboration | Amazon Web Services AI, Google Research Mila, Université de Montréal, NTU, Singapore, ETH Zürich |
| Pseudocode | No | The paper includes mathematical equations and a model architecture diagram, but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We run our experiments on the publicly released source code3 of [Yin and Neubig, 2018]', referring to a baseline's code. It does not explicitly state that the authors' own Self-IRU implementation code is open-source or provide a link to it. |
| Open Datasets | Yes | We use the well-established pixel-wise MNIST and CIFAR-10 datasets. We experiment for the logical inference task on the standard dataset2 proposed by Bowman et al. [2014]. We use three well-established datasets: Nottingham, JSB Chorales, and Piano Midi [Boulanger-Lewandowski et al., 2012]. |
| Dataset Splits | No | For logical inference, the paper states the model is trained on sequences with 6 or fewer operations and evaluated on sequences of 6 to 12 operations, indicating a train/test split. However, it does not explicitly specify a validation set or detailed split percentages for train/validation/test across all experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions running experiments on the 'publicly released source code3 of [Yin and Neubig, 2018]' and following its hyperparameter details. However, it does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) within the paper. |
| Experiment Setup | Yes | Table 7 reports their optimal combinations for diverse tasks in the experiments, where the maximum recursion depth is evaluated on L = {0, 1, 2, 3}. |