Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages

Authors: Andy Yang, David Chiang, Dana Angluin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper is purely theoretical, and we do not foresee any direct societal impacts.
Researcher Affiliation Academia Andy Yang University of Notre Dame David Chiang University of Notre Dame Dana Angluin Yale University
Pseudocode Yes A B-RASP program is a sequence of operations that compute new Boolean vectors. Although they may have descriptive names, and names may be reused, here, to streamline definitions and proofs, we assume that all the Boolean vectors are numbered consecutively. That is, 𝑃1, . . . , 𝑃|Ξ£| are the initial Boolean vectors π‘„πœŽfor 𝜎 Ξ£, and the Boolean vectors computed by the program are numbered starting from 𝑃|Ξ£|+1 without repetition. After the first 𝑑vectors, vector 𝑃𝑑+1 is computed using one of the following operations. Position-wise operations. 𝑃𝑑+1(𝑖) can be be computed by 𝑃𝑑+1(𝑖) := 𝑅(𝑖), where 𝑅(𝑖) is a Boolean combination of zero or more of {𝑃1(𝑖), . . . , 𝑃𝑑(𝑖)}. Attention operations. 𝑃𝑑+1(𝑖) can be computed by either of 𝑃𝑑+1(𝑖) := 𝑗[𝑀(𝑖, 𝑗), 𝑆(𝑖, 𝑗)] 𝑉(𝑖, 𝑗) : 𝐷(𝑖) 𝑃𝑑+1(𝑖) := 𝑗[𝑀(𝑖, 𝑗), 𝑆(𝑖, 𝑗)] 𝑉(𝑖, 𝑗) : 𝐷(𝑖).
Open Source Code Yes A B-RASP simulator that allows one to write and run additional examples can be found at https://b-rasp.github.io/.
Open Datasets No This paper does not include any experimental results.
Dataset Splits No This paper does not include any experimental results.
Hardware Specification No This paper does not include any experimental results.
Software Dependencies No This paper does not include any experimental results.
Experiment Setup No This paper does not include any experimental results.