InversionView: A General-Purpose Method for Reading Information from Neural Activations
Authors: Xinting Huang, Madhur Panwar, Navin Goyal, Michael Hahn
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present four case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we show that Inversion View can reveal clear information contained in activations, including basic information about tokens appearing in the context, as well as more complex information, such as the count of certain tokens, their relative positions, and abstract knowledge about the subject. We also provide causally verified circuits to confirm the decoded information. |
| Researcher Affiliation | Collaboration | Xinting Huang Saarland University xhuang@lst.uni-saarland.de Madhur Panwar EPFL madhur.panwar@epfl.ch Navin Goyal Microsoft Research India navingo@microsoft.com Michael Hahn Saarland University mhahn@lst.uni-saarland.de |
| Pseudocode | No | Does the paper contain STRUCTURED PSEUDOCODE OR ALGORITHM BLOCKS (clearly labeled algorithm sections or code-like formatted procedures)? Answer: [No] |
| Open Source Code | Yes | Code is available at https://github.com/huangxt39/Inversion View |
| Open Datasets | Yes | To train the decoder model, we collect text from 3 datasets, including the factual statements from COUNTERFACT [37] and BEAR [56], as well as general text from Mini Pile [30]. |
| Dataset Splits | Yes | We created 1.56M instances and applied a 75%-25% train-test split; test set accuracy is 99.53% (Details in Appendix D). |
| Hardware Specification | Yes | We ran all experiments on NVIDIA A100 cards. |
| Software Dependencies | No | Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment? Answer: [No] Justification: No version information for software dependencies (e.g., PyTorch, CUDA) were found. |
| Experiment Setup | Yes | The model is trained with a batch size of 128 for 100 epochs, using a constant learning rate of 0.0005, weight decay of 0.01, and Adam W [36] optimizer. |