Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models

Authors: Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that KNOWLEDGE CARD achieves state-of-the-art performance on six benchmark datasets. Ultimately, KNOWLEDGE CARD framework enables dynamic synthesis and updates of knowledge from diverse domains.
Researcher Affiliation Academia Shangbin Feng1 Weijia Shi1 Yuyang Bai2 Vidhisha Balachandran3 Tianxing He1 Yulia Tsvetkov1 1University of Washington 2Xi an Jiaotong University 3Carnegie Mellon University
Pseudocode Yes Algorithm 1: Bottom-Up Approach ... Algorithm 2: Top-Down Approach
Open Source Code Yes 1Resources are available at https://github.com/Bunsen Feng/Knowledge Card.
Open Datasets Yes For general-purpose QA, we adopt MMLU (Hendrycks et al., 2020)... To evaluate multi-domain knowledge synthesis, we adopt misinformation detection... We leverage the widely adopted LUN misinformation detection dataset (Rashkin et al., 2017)...
Dataset Splits No The paper mentions '5-shot in-context learning setting' and an official 'demonstration set' for MMLU and MIDTERMQA, and '16-shot in-context learning' for LUN, which are used for few-shot learning. However, it does not specify a distinct 'validation' split with percentages or counts for hyperparameter tuning or model selection.
Hardware Specification Yes We used a GPU cluster with 16 NVIDIA A40 GPUs, 1988G memory, and 104 CPU cores for the experiments.
Software Dependencies No The paper lists specific models and tools used (e.g., OPT-1.3B, MPNet, Pegasus, Codex, Fact KB, Vitamin C) along with their citations, but it does not provide specific version numbers for the underlying software libraries or environment (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes We present hyperparameter settings in Table 6. ... LEARNING RATE 2e-5, BATCH SIZE 32, MAX EPOCHS 10, OPTIMIZER ADAM, TEMPERATURE 0.1