Tracr: Compiled Transformers as a Laboratory for Interpretability
Authors: David Lindner, Janos Kramar, Sebastian Farquhar, Matthew Rahtz, Tom McGrath, Vladimir Mikulik
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study superposition in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking. Our main contributions are to: (3) Provide a case-study where we examine superposition Tracr models compressed using gradient descent (Section 5). We confirm key observations by Elhage et al. (2022b) in a new setting: compressed models drop unnecessary features, and represent less important features in superposition. We present two case studies of compressing compiled models using the frac_prevs and the sort_unique programs from Section 4. These sketch how Tracr can be practically useful in advancing interpretability research. Appendix E contains more details on the training setup, hyperparameters, and resources used. |
| Researcher Affiliation | Industry | David Lindner Google DeepMind János Kramár Google DeepMind Sebastian Farquhar Google DeepMind Matthew Rahtz Google DeepMind Thomas Mc Grath Google DeepMind Vladimir Mikulik Google DeepMind |
| Pseudocode | No | The paper includes examples of RASP programs (e.g., Figure 2, Figure 5, Figure 9, Figure 10, Figure 11) which are code-like, but these are illustrative programs for the compiler, not pseudocode or algorithm blocks detailing the Tracr compiler's internal mechanisms or algorithms. |
| Open Source Code | Yes | We provide an open-source implementation of Tracr at https://github.com/google-deepmind/tracr. |
| Open Datasets | No | The paper uses RASP programs with specific input sequences as examples (e.g., "<bos>xacx") and focuses on compiling these. It does not mention or use any publicly available datasets in the traditional sense, nor does it provide access information for such datasets. |
| Dataset Splits | No | The paper discusses training for compression (Section 5.1, Appendix E) but does not provide specific training/validation/test splits for any dataset. It refers to inputs being processed by the model without detailing how data is partitioned for different phases. |
| Hardware Specification | Yes | Each compression run requires between 1 and 4 hours of run time on two CPU cores (depending on the size of the model to compress). |
| Software Dependencies | No | The paper mentions software components like "Jax", "Haiku", "AdamW optimizer (implemented in Optax)" but does not specify version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We train W using the Adam W optimizer (implemented in Optax) with a weight decay factor of 0.1, and parameters β1 = 0.9, β2 = 0.99. We train for 3 × 10^5 steps with a batch size of 256. We decay the learning rate linearly from 10^-3 to 10^-6 over the first half of training. |