Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Authors: Alex Tamkin, Mohammad Taufeeque, Noah Goodman

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first validate codebook features on a finite state machine dataset with far more hidden states than neurons. (...) We then train Transformer language models with up to 410M parameters on two natural language datasets. (...) In Table 1, we report the accuracy of the resulting models both in terms of their language modeling loss, next token accuracy, and their ability to produce valid transitions of the FSM across a generated sequence. (...) We perform several ablation studies to identify the importance of different elements of our training method.
Researcher Affiliation Collaboration Alex Tamkin 1 Mohammad Taufeeque 2 Noah D. Goodman 3 (...) 1Anthropic, Work performed while at Stanford University. 2FAR AI 3Stanford University.
Pseudocode No No structured pseudocode or algorithm blocks were found within the paper. The methodology is described in prose and with diagrams (e.g., Figure 1, Figure 2) but not in a formal algorithmic format.
Open Source Code Yes Our codebase and models are open-sourced at this URL.1
Open Datasets Yes We finetune a small, 1-layer, 21 million parameter model on the Tiny Stories dataset of children s stories (Eldan & Li, 2023). (...) We also finetune a larger, 24-layer 410M parameter model on the Wiki Text-103 dataset, consisting of high-quality English-language Wikipedia articles (Merity et al., 2016). (...) For a pretrained model, we use the Pythia 410m parameter model, trained on the Pile dataset with deduplication (Biderman et al., 2023).
Dataset Splits No No specific dataset split percentages or sample counts for training, validation, and test sets are explicitly provided. The paper mentions training models and evaluating on an 'eval set' (Appendix D.2) but does not detail the splits for reproducibility.
Hardware Specification Yes Depending on the model, we use a batch size of 64 to 256 and between 1-4 A100 GPUs. (...) Computed on an A100 40GB GPU, with a batch size of 64 and over 100 batches.
Software Dependencies No No specific version numbers for software dependencies were provided. The paper mentions using the 'Adam optimizer (Kingma & Ba, 2014)' and that 'inference can be sped up an additional amount through fast maximum inner product search (MIPS) algorithms such as FAISS (Johnson et al., 2019)'.
Experiment Setup Yes We set λ to 1 in this work. (...) We train for 15k steps for most experiments. For the Tiny Stories datasets, we train for 100k steps. The sequence length for Wiki Text-103 is 1024, and for Tiny Stories it is 512. Depending on the model, we use a batch size of 64 to 256 (...) By default, codebooks have C = 10k codebook size unless otherwise specified. (...) We use a constant learning rate of 1e-3 with a batch size of 512 and train the models for 20, 000 training steps. (...) We train for 100k steps with a batch size of 96, with learning rate warmup of 5% and linear cooldown to 0.