Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Compositional Reasoning with Transformers, RNNs, and Chain of Thought

Authors: Gilad Yehudai, Noah Amsel, Joan Bruna

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that under standard hardness assumptions, none of these three architectures is capable of solving CRQs unless some hyperparameter (depth, embedding dimension, and number of chain of thought tokens, respectively) grows with the size of the input. We then provide constructions for solving CRQs with each architecture. Our main contributions are as follows: 1. In Section 3.1, we present Compositional Reasoning Questions... 2. In Section 4, we prove that transformers with constant depth cannot solve arbitrary CRQs (Theorem 4.3), but transformers with depth L can solve all CRQs of depth up to L (Theorem 4.1). 3. In Section 5, we prove that RNNs with constant hidden dimension cannot solve arbitrary CRQs (Theorem 5.5), but RNNs with O(log n) hidden dimension and constant depth can solve all CRQs of size n (Theorem 5.4). 4. In Section 6, we prove that transformers augmented with O(log n) Co T tokens cannot solve CRQs of size n, but transformers augmented with O(n) Co T tokens can (Theorem 6.1).
Researcher Affiliation	Academia	Gilad Yehudai Courant Institute of Mathematical Sciences New York University EMAIL Noah Amsel Courant Institute of Mathematical Sciences New York University EMAIL Joan Bruna Courant Institute of Mathematical Sciences, & Center for Data Science, New York University Center for Computational Mathematics, Flatiron Institute EMAIL
Pseudocode	Yes	Algorithm 1: Memory-Rank Sort
Open Source Code	No	The NeurIPS Paper Checklist for this submission states: "This paper does not include experiments." and "No data or models are released." There is no explicit statement in the paper about releasing code or a link to a repository.
Open Datasets	No	The NeurIPS Paper Checklist for this submission states: "This paper does not include experiments." There is no mention of datasets being made publicly available for download or being used in experiments within the paper text.
Dataset Splits	No	The paper does not conduct experiments and therefore does not discuss dataset splits.
Hardware Specification	No	The paper does not conduct experiments and therefore does not specify hardware used.
Software Dependencies	No	The paper does not conduct experiments and therefore does not specify software dependencies with version numbers.
Experiment Setup	No	The paper does not conduct experiments and therefore does not provide details about experimental setup or hyperparameters.