Knowledge Circuits in Pretrained Transformers
Authors: Yunzhi Yao, Ningyu Zhang, Zekun Xi, Mengru Wang, Ziwen Xu, Shumin Deng, Huajun Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments, conducted with GPT2 and Tiny LLAMA, have allowed us to observe how certain information heads, relation heads, and Multilayer Perceptrons collaboratively encode knowledge within the model. |
| Researcher Affiliation | Collaboration | 1 Zhejiang University 2 National University of Singapore, NUS-NCS Joint Lab, Singapore 3 Zhejiang Key Laboratory of Big Data Intelligent Computing |
| Pseudocode | No | The paper describes methods and processes in textual form and through mathematical equations but does not include any dedicated pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code and data are available in https://github.com/zjunlp/Knowledge Circuits. |
| Open Datasets | Yes | We utilize the dataset provided by LRE [42] and consider different kinds of knowledge, including linguistic, commonsense, fact, and bias. |
| Dataset Splits | Yes | To evaluate completeness, we first construct the circuit using the validation data Dval for a specific knowledge type and then test its performance on the test split Dtest in isolation. |
| Hardware Specification | Yes | We use the NVIDIA-A800 (40GB) to conduct our experiments. |
| Software Dependencies | No | The paper mentions using specific toolkits like "Automated Circuit Discovery [32] toolkit" and "transformer lens [41]", and frameworks like "Easy Edit[74]", but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The primary hyperparameter for constructing a circuit is the threshold τ used to detect performance drops... In our experiment, we test τ values from the set {0.02, 0.01, 0.005} to determine the appropriate circuit size for different types of knowledge. |