reproducibilityindex.ai

SelfIE: Self-Interpretation of Large Language Model Embeddings

Authors: Haozhe Chen, Carl Vondrick, Chengzhi Mao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our visualizations and empirical results demonstrate that our interpretation framework faithfully conveys information in hidden embeddings and reveals internal reasoning procedures in LLMs. Self IE achieves the same performance on eliciting LLM s internal representation of world state in Text World (Cˆot e et al., 2019) as prior supervised approach (Li et al., 2021) trained on 100-shot samples, demonstrating the effectiveness and faithfulness of our zero-shot readout approach.
Researcher Affiliation	Academia	1Department of Computer Science, Columbia University, New York, NY 2Mila, Montreal, Canada 3Mc Gill University, Montreal, Canada.
Pseudocode	No	The paper describes methods with textual descriptions and equations but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper mentions 'selfie.cs.columbia.edu' but does not explicitly state that source code for the described methodology is released at this link or provide a direct link to a code repository.
Open Datasets	Yes	Self IE achieves the same performance on eliciting LLM s internal representation of world state in Text World (Cˆot e et al., 2019) as prior supervised approach (Li et al., 2021) trained on 100-shot samples...We test the efficiency of supervised control of reasoning on editing knowledge in a model with Counterfact dataset (Meng et al., 2022)...when calculating loss, we use Wikitext (Merity et al., 2016) as a reference corpus
Dataset Splits	Yes	We generate 12900 samples of context, entity, positive state, negative state. We show sample data in Appendix A.1. We use 3400 samples for evaluating Self IE and linear probing and use 9500 for training linear probes.
Hardware Specification	Yes	We use 8 NVIDIA RTX A6000 for interpretation and 8 NVIDIA A100 for reasoning control.
Software Dependencies	No	The paper mentions LLa MA-2-70B-Chat but does not specify versions for other key software components or libraries required for reproducibility.
Experiment Setup	Yes	We used Adam optimizer with learning rate 3e-3. We update parameter 10 times. For both Molotov Cocktail example and ethical control, we used Adam optimizer with learning rate 3e-4.