Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Authors: Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By evaluating on 40+ different tasks, we show that Ki CLarge with 770M parameters easily outperforms large language models that are 4-39x larger. In addition, Ki C also exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.
Researcher Affiliation Industry Tencent AI Lab, Bellevue, WA 98004, USA
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper refers to using existing open-source models like MPNet ('We use All-MPNetbase-v25 as the encoder... and we use the publically available model checkpoint') but does not provide an explicit statement or link to the source code for the Ki C model itself.
Open Datasets Yes We adopt the same setting as T0 (Sanh et al., 2022), where we train Ki C models on a collection of tasks and then evaluate on another set of unseen tasks in a zero-shot manner. ... We train our Ki C model on a mixture of multiple tasks (39 tasks in total) by combining and shuffling all training instances from different tasks (8.4M in total).
Dataset Splits Yes Following standard approaches, we choose the prompt that yields the best accuracy (%) on the validation set. ... we reproduce T0Large with the same collection of tasks and evaluate Ki CLarge on the validation set of each in-domain task (Table 4).
Hardware Specification Yes Our final Ki CLarge model is trained with 128 V100 GPUs for 42 hours.
Software Dependencies No The paper mentions software like T5, MPNet, and SCaNN, but does not provide specific version numbers for these or other underlying software libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The hyper-parameters of learning Ki CBase and Ki CLarge are listed in Table 9. In addition, we also list the hyper-parameters of single-task finetuning used in Table 10. Table 9 includes: Learning Rate, Max. Input Length, Max. Output Length, Batch Size, α, # epoch, Max. Knowledge Pieces.