reproducibilityindex.ai

Thinking Like Transformers

Authors: Gail Weiss, Yoav Goldberg, Eran Yahav

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we show how a compiled RASP program can indeed be realised in a neural transformer (as in Figure 1), and occasionally is even the solution found by a transformer trained on the task using gradient descent (Figs 5 and 4). We evaluate the relation of RASP to transformers on three fronts: 1. its ability to upper bound the number of heads and layers required to solve a task, 2. the tightness of that bound, 3. its feasibility in a transformer, i.e., whether a sufﬁciently large transformer can encode a given RASP solution., training several transformers.
Researcher Affiliation	Collaboration	1Technion, Haifa, Israel 2Bar Ilan University, Ramat Gan, Israel 3Allen Institute for AI. Correspondence to: Gail Weiss <sgailw@cs.technion.ac.il>.
Pseudocode	Yes	Figure 1: We consider double-histogram, the task of counting for each input token how many unique input tokens have the same frequency as itself... (a) shows a RASP program for this task... Figure 3: RASP program for the task shufﬂe-dyck-2 (balance 2 parenthesis pairs, independently of each other)...
Open Source Code	Yes	Code We provide a RASP read-evaluate-print-loop (REPL) in http://github.com/tech-srl/RASP, along with a RASP cheat sheet and link to replication code for our work.
Open Datasets	No	The paper defines specific tasks like “Reverse” or “Histograms” with examples, implying data generated for these tasks, but it does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset used for training. It does not name standard benchmark datasets with citations.
Dataset Splits	No	The paper refers to “test accuracy” but does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits with citations).
Hardware Specification	No	The provided text excerpt does not contain any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It mentions details are relegated to Appendix A, which is not included in the provided text.
Software Dependencies	No	The provided text excerpt does not mention specific software dependencies with version numbers. It mentions details are relegated to Appendix A, which is not included in the provided text.
Experiment Setup	No	The paper states, “We relegate the exact details of the transformers and their training to Appendix A.” The provided text excerpt, which ends before Appendix A, therefore does not contain specific experimental setup details such as hyperparameters or training configurations.