Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
Authors: Jishnu Ray Chowdhury, Cornelia Caragea
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments and Results; In Table 1, we compare the empirical time-memory trade-offs of the most relevant Tree-Rv NN models. |
| Researcher Affiliation | Academia | Jishnu Ray Chowdhury Cornelia Caragea Computer Science University of Illinois Chicago jraych2@uic.edu cornelia@uic.edu |
| Pseudocode | No | No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found. |
| Open Source Code | Yes | Our code is available at: https://github.com/JRC1995/Beam Recursion Family/. |
| Open Datasets | Yes | List Ops was originally introduced by Nangia and Bowman [70].; Long Range Arena (LRA): LRA is a set of tasks designed to evaluate the capacities of neural models for modeling long-range dependencies [92].; Logical Inference was introduced by Bowman et al. [6]. |
| Dataset Splits | Yes | We use the original development set for validation. We test on the original test set (near-IID split); the length generalization splits from Havrylov et al. [38]...; Our actual development set is a random sample of 10,000 data points from the filtered training set. |
| Hardware Specification | Yes | Table 1: Empirical time and (peak) memory consumption for various models on an RTX A6000.; All models were trained on a single Nvidia RTX A6000. |
| Software Dependencies | No | The paper mentions software like S4D, but does not provide specific version numbers for underlying libraries or programming languages (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For RIR-EBT-GRC, we use a beam size of 7 for all tasks except Retrieval LRA where we use a beam size of 5. Every other hyperparameters are unchanged from BT-GRC for RIR-models or BBT-GRC for the earlier tasks. ... We initialize S4D (whether when using the pure S4D model or S4D for pre-chunk processing) in S4D-Inv mode and use billinear discretization. For LRA tasks, we use the same hyperparameters as Gu et al. [30] for S4D. |