Transformers Represent Belief State Geometry in their Residual Stream
Authors: Adam Shai, Lucas Teixeira, Alexander Oldenziel, Sarah Marzen, Paul Riechers
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test this framework, we conduct well-controlled experiments where we train transformers on data generated from processes with hidden ground truth structure, and then use our theory to make predictions about the geometry of internal activations. Even in cases where the framework predicts highly nontrivial fractal structure, our empirical results confirm these predictions (Figure 1). |
| Researcher Affiliation | Collaboration | Adam S. Shai Simplex PIBBSS Sarah E. Marzen Department of Natural Sciences Pitzer and Scripps College Lucas Teixeira PIBBSS Alexander Gietelink Oldenziel University College London Timaeus Paul M. Riechers Simplex BITS |
| Pseudocode | No | The paper describes methods and procedures but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We have included code in the submission that reproduces all results. It should be noted that we will continue to work on cleaning up this code for final submission (though the code very much works as is and recreates the figures in the submission). |
| Open Datasets | No | The paper uses data generated by specific Hidden Markov Models (HMMs) (Mess3 Process and RRXOR process) defined within the paper itself. It does not provide access information (link, DOI, specific repository, or citation to an external dataset source) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions 'normalized validation loss' (Figure S2) indicating that a validation set was used, and describes a train/test split (20%/80%) for the *regression analysis* on activations. However, it does not explicitly provide the specific percentages or sample counts for training, validation, and test splits for the *data generated by the HMMs* that the transformer itself was trained on, to allow for reproduction of the data partitioning for the primary experiment. |
| Hardware Specification | No | The paper explicitly states in its NeurIPS checklist: 'These experiments are on small toy models and thus can be run on any modern hardware,' and therefore provides no specific hardware details for reproduction. |
| Software Dependencies | No | The paper mentions using the 'Transformer Lens library [22]' for analysis, but does not provide specific version numbers for this or any other software dependencies. It only mentions 'pip install -e .' for installation without listing explicit versioned dependencies. |
| Experiment Setup | Yes | In our experiments, we trained a transformer model using the following hyperparameters and training parameters: The model had a context window size of 10, used ReLU as the activation function, and had a head dimension of 8 and a model dimension of 64. There was 1 attention head in each of 4 layers. The model had MLPs of dimension 256 and used causal attention masking. Layer normalization was applied. For training, we used the Stochastic Gradient Descent (SGD) optimizer with a batch size of 64, running for 1,000,000 epochs, and a learning rate of 0.01 with no weight decay. |