Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Authors: Andy Yang, David Chiang, Dana Angluin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper is purely theoretical, and we do not foresee any direct societal impacts. |
| Researcher Affiliation | Academia | Andy Yang University of Notre Dame David Chiang University of Notre Dame Dana Angluin Yale University |
| Pseudocode | Yes | A B-RASP program is a sequence of operations that compute new Boolean vectors. Although they may have descriptive names, and names may be reused, here, to streamline definitions and proofs, we assume that all the Boolean vectors are numbered consecutively. That is, π1, . . . , π|Ξ£| are the initial Boolean vectors ππfor π Ξ£, and the Boolean vectors computed by the program are numbered starting from π|Ξ£|+1 without repetition. After the first π‘vectors, vector ππ‘+1 is computed using one of the following operations. Position-wise operations. ππ‘+1(π) can be be computed by ππ‘+1(π) := π (π), where π (π) is a Boolean combination of zero or more of {π1(π), . . . , ππ‘(π)}. Attention operations. ππ‘+1(π) can be computed by either of ππ‘+1(π) := π[π(π, π), π(π, π)] π(π, π) : π·(π) ππ‘+1(π) := π[π(π, π), π(π, π)] π(π, π) : π·(π). |
| Open Source Code | Yes | A B-RASP simulator that allows one to write and run additional examples can be found at https://b-rasp.github.io/. |
| Open Datasets | No | This paper does not include any experimental results. |
| Dataset Splits | No | This paper does not include any experimental results. |
| Hardware Specification | No | This paper does not include any experimental results. |
| Software Dependencies | No | This paper does not include any experimental results. |
| Experiment Setup | No | This paper does not include any experimental results. |