Tracr: Compiled Transformers as a Laboratory for Interpretability

Authors: David Lindner, Janos Kramar, Sebastian Farquhar, Matthew Rahtz, Tom McGrath, Vladimir Mikulik

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study superposition in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking. Our main contributions are to: (3) Provide a case-study where we examine superposition Tracr models compressed using gradient descent (Section 5). We confirm key observations by Elhage et al. (2022b) in a new setting: compressed models drop unnecessary features, and represent less important features in superposition. We present two case studies of compressing compiled models using the frac_prevs and the sort_unique programs from Section 4. These sketch how Tracr can be practically useful in advancing interpretability research. Appendix E contains more details on the training setup, hyperparameters, and resources used.
Researcher Affiliation Industry David Lindner Google DeepMind János Kramár Google DeepMind Sebastian Farquhar Google DeepMind Matthew Rahtz Google DeepMind Thomas Mc Grath Google DeepMind Vladimir Mikulik Google DeepMind
Pseudocode No The paper includes examples of RASP programs (e.g., Figure 2, Figure 5, Figure 9, Figure 10, Figure 11) which are code-like, but these are illustrative programs for the compiler, not pseudocode or algorithm blocks detailing the Tracr compiler's internal mechanisms or algorithms.
Open Source Code Yes We provide an open-source implementation of Tracr at https://github.com/google-deepmind/tracr.
Open Datasets No The paper uses RASP programs with specific input sequences as examples (e.g., "<bos>xacx") and focuses on compiling these. It does not mention or use any publicly available datasets in the traditional sense, nor does it provide access information for such datasets.
Dataset Splits No The paper discusses training for compression (Section 5.1, Appendix E) but does not provide specific training/validation/test splits for any dataset. It refers to inputs being processed by the model without detailing how data is partitioned for different phases.
Hardware Specification Yes Each compression run requires between 1 and 4 hours of run time on two CPU cores (depending on the size of the model to compress).
Software Dependencies No The paper mentions software components like "Jax", "Haiku", "AdamW optimizer (implemented in Optax)" but does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes We train W using the Adam W optimizer (implemented in Optax) with a weight decay factor of 0.1, and parameters β1 = 0.9, β2 = 0.99. We train for 3 × 10^5 steps with a batch size of 256. We decay the learning rate linearly from 10^-3 to 10^-6 over the first half of training.