Thinking Like Transformers
Authors: Gail Weiss, Yoav Goldberg, Eran Yahav
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we show how a compiled RASP program can indeed be realised in a neural transformer (as in Figure 1), and occasionally is even the solution found by a transformer trained on the task using gradient descent (Figs 5 and 4). We evaluate the relation of RASP to transformers on three fronts: 1. its ability to upper bound the number of heads and layers required to solve a task, 2. the tightness of that bound, 3. its feasibility in a transformer, i.e., whether a sufficiently large transformer can encode a given RASP solution., training several transformers. |
| Researcher Affiliation | Collaboration | 1Technion, Haifa, Israel 2Bar Ilan University, Ramat Gan, Israel 3Allen Institute for AI. Correspondence to: Gail Weiss <sgailw@cs.technion.ac.il>. |
| Pseudocode | Yes | Figure 1: We consider double-histogram, the task of counting for each input token how many unique input tokens have the same frequency as itself... (a) shows a RASP program for this task... Figure 3: RASP program for the task shuffle-dyck-2 (balance 2 parenthesis pairs, independently of each other)... |
| Open Source Code | Yes | Code We provide a RASP read-evaluate-print-loop (REPL) in http://github.com/tech-srl/RASP, along with a RASP cheat sheet and link to replication code for our work. |
| Open Datasets | No | The paper defines specific tasks like “Reverse” or “Histograms” with examples, implying data generated for these tasks, but it does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset used for training. It does not name standard benchmark datasets with citations. |
| Dataset Splits | No | The paper refers to “test accuracy” but does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits with citations). |
| Hardware Specification | No | The provided text excerpt does not contain any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It mentions details are relegated to Appendix A, which is not included in the provided text. |
| Software Dependencies | No | The provided text excerpt does not mention specific software dependencies with version numbers. It mentions details are relegated to Appendix A, which is not included in the provided text. |
| Experiment Setup | No | The paper states, “We relegate the exact details of the transformers and their training to Appendix A.” The provided text excerpt, which ends before Appendix A, therefore does not contain specific experimental setup details such as hyperparameters or training configurations. |