Propagating Knowledge Updates to LMs Through Distillation
Authors: Shankar Padmanabhan, Yasumasa Onoe, Michael Zhang, Greg Durrett, Eunsol Choi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that this approach is more effective at propagating knowledge updates than finetuning and other gradient-based knowledge-editing methods. We evaluate our approach on two knowledge propagation benchmarks: ENTITY INFERENCES [32] and Entity Cloze by Date (ECBD) [31]. |
| Researcher Affiliation | Academia | Shankar Padmanabhan, Yasumasa Onoe, Michael J.Q. Zhang, Greg Durrett, Eunsol Choi Department of Computer Science The University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 Knowledge Propagation Through Distillation |
| Open Source Code | Yes | Our code and data are available at https://github.com/shankarp8/knowledge_distillation. |
| Open Datasets | Yes | We evaluate our approach on two knowledge propagation benchmarks: ENTITY INFERENCES [32] and Entity Cloze by Date (ECBD) [31]. |
| Dataset Splits | Yes | All hyperparameter experiments were conducted using a validation set drawn from ECBD 2021. |
| Hardware Specification | Yes | All experiments were run using Quadro RTX 8000 GPUs with 48GB RAM. |
| Software Dependencies | No | The paper mentions using the Hugging Face Transformers library and the Deepspeed library, but does not provide specific version numbers for these software dependencies or any other ancillary software. |
| Experiment Setup | Yes | We experimented with a variety of learning rates (from 1e-8 to 1e-4) and the numbers of epochs (K) (between 1 and 20) across all experiments using a grid search. The specific values used can be found in Appendix B.1. For example, for both base LMs, we used a learning rate of 5e-4 for 10 epochs for fine-tuning on the definition sentence and a learning rate of 5e-4 and 5 epochs for each of 5 sentences for distillation. For GPT-Neo-1.3B and GPT2-XL, we trained for 5 epochs with a learning rate of 3e-6 for fine-tuning. |