reproducibilityindex.ai

The Illusion of State in State-Space Models

Authors: William Merrill, Jackson Petty, Ashish Sabharwal

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis reveals that the expressive power of S4, Mamba, and related SSMs is limited very similarly to transformers (within TC0), meaning these SSMs cannot solve simple state-tracking problems like permutation composition and consequently are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that S4 and Mamba indeed struggle with state tracking.
Researcher Affiliation	Collaboration	1New York University 2Allen Institute for AI. Correspondence to: William Merrill <willm@nyu.edu>, Jackson Petty <petty@nyu.edu>, Ashish Sabharwal <ashishs@allenai.org>.
Pseudocode	No	The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code: http://jpetty.org/ssm-illusion
Open Datasets	No	The paper mentions generating sequences from mathematical groups (A5, A4 Z5, or Z60) and including '3600 pairwise sequences of length 2 in the training data', but it does not provide concrete access information (link, DOI, citation) for a publicly available dataset.
Dataset Splits	No	The paper mentions evaluating on 'validation accuracy' but does not specify the exact percentages or counts for training, validation, and test splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup	No	The paper describes the task (token-tagging), model initialization (e.g., 'affine projection α as a random normal centered around the identity'), and training process (e.g., 'train models on sequences of length n'), but it lacks specific hyperparameter values like learning rate, batch size, or number of epochs.