SelfIE: Self-Interpretation of Large Language Model Embeddings

Authors: Haozhe Chen, Carl Vondrick, Chengzhi Mao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our visualizations and empirical results demonstrate that our interpretation framework faithfully conveys information in hidden embeddings and reveals internal reasoning procedures in LLMs. Self IE achieves the same performance on eliciting LLM s internal representation of world state in Text World (Cˆot e et al., 2019) as prior supervised approach (Li et al., 2021) trained on 100-shot samples, demonstrating the effectiveness and faithfulness of our zero-shot readout approach.
Researcher Affiliation Academia 1Department of Computer Science, Columbia University, New York, NY 2Mila, Montreal, Canada 3Mc Gill University, Montreal, Canada.
Pseudocode No The paper describes methods with textual descriptions and equations but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper mentions 'selfie.cs.columbia.edu' but does not explicitly state that source code for the described methodology is released at this link or provide a direct link to a code repository.
Open Datasets Yes Self IE achieves the same performance on eliciting LLM s internal representation of world state in Text World (Cˆot e et al., 2019) as prior supervised approach (Li et al., 2021) trained on 100-shot samples...We test the efficiency of supervised control of reasoning on editing knowledge in a model with Counterfact dataset (Meng et al., 2022)...when calculating loss, we use Wikitext (Merity et al., 2016) as a reference corpus
Dataset Splits Yes We generate 12900 samples of context, entity, positive state, negative state. We show sample data in Appendix A.1. We use 3400 samples for evaluating Self IE and linear probing and use 9500 for training linear probes.
Hardware Specification Yes We use 8 NVIDIA RTX A6000 for interpretation and 8 NVIDIA A100 for reasoning control.
Software Dependencies No The paper mentions LLa MA-2-70B-Chat but does not specify versions for other key software components or libraries required for reproducibility.
Experiment Setup Yes We used Adam optimizer with learning rate 3e-3. We update parameter 10 times. For both Molotov Cocktail example and ethical control, we used Adam optimizer with learning rate 3e-4.