Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability

Authors: Jishnu Ray Chowdhury, Cornelia Caragea

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments and Results; In Table 1, we compare the empirical time-memory trade-offs of the most relevant Tree-Rv NN models.
Researcher Affiliation Academia Jishnu Ray Chowdhury Cornelia Caragea Computer Science University of Illinois Chicago jraych2@uic.edu cornelia@uic.edu
Pseudocode No No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found.
Open Source Code Yes Our code is available at: https://github.com/JRC1995/Beam Recursion Family/.
Open Datasets Yes List Ops was originally introduced by Nangia and Bowman [70].; Long Range Arena (LRA): LRA is a set of tasks designed to evaluate the capacities of neural models for modeling long-range dependencies [92].; Logical Inference was introduced by Bowman et al. [6].
Dataset Splits Yes We use the original development set for validation. We test on the original test set (near-IID split); the length generalization splits from Havrylov et al. [38]...; Our actual development set is a random sample of 10,000 data points from the filtered training set.
Hardware Specification Yes Table 1: Empirical time and (peak) memory consumption for various models on an RTX A6000.; All models were trained on a single Nvidia RTX A6000.
Software Dependencies No The paper mentions software like S4D, but does not provide specific version numbers for underlying libraries or programming languages (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For RIR-EBT-GRC, we use a beam size of 7 for all tasks except Retrieval LRA where we use a beam size of 5. Every other hyperparameters are unchanged from BT-GRC for RIR-models or BBT-GRC for the earlier tasks. ... We initialize S4D (whether when using the pure S4D model or S4D for pre-chunk processing) in S4D-Inv mode and use billinear discretization. For LRA tasks, we use the same hyperparameters as Gu et al. [30] for S4D.