reproducibilityindex.ai

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Authors: Xinting Huang, Madhur Panwar, Navin Goyal, Michael Hahn

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present four case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we show that Inversion View can reveal clear information contained in activations, including basic information about tokens appearing in the context, as well as more complex information, such as the count of certain tokens, their relative positions, and abstract knowledge about the subject. We also provide causally verified circuits to confirm the decoded information.
Researcher Affiliation	Collaboration	Xinting Huang Saarland University xhuang@lst.uni-saarland.de Madhur Panwar EPFL madhur.panwar@epfl.ch Navin Goyal Microsoft Research India navingo@microsoft.com Michael Hahn Saarland University mhahn@lst.uni-saarland.de
Pseudocode	No	Does the paper contain STRUCTURED PSEUDOCODE OR ALGORITHM BLOCKS (clearly labeled algorithm sections or code-like formatted procedures)? Answer: [No]
Open Source Code	Yes	Code is available at https://github.com/huangxt39/Inversion View
Open Datasets	Yes	To train the decoder model, we collect text from 3 datasets, including the factual statements from COUNTERFACT [37] and BEAR [56], as well as general text from Mini Pile [30].
Dataset Splits	Yes	We created 1.56M instances and applied a 75%-25% train-test split; test set accuracy is 99.53% (Details in Appendix D).
Hardware Specification	Yes	We ran all experiments on NVIDIA A100 cards.
Software Dependencies	No	Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment? Answer: [No] Justification: No version information for software dependencies (e.g., PyTorch, CUDA) were found.
Experiment Setup	Yes	The model is trained with a batch size of 128 for 100 epochs, using a constant learning rate of 0.0005, weight decay of 0.01, and Adam W [36] optimizer.