Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
Authors: Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating on 40+ different tasks, we show that Ki CLarge with 770M parameters easily outperforms large language models that are 4-39x larger. In addition, Ki C also exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models. |
| Researcher Affiliation | Industry | Tencent AI Lab, Bellevue, WA 98004, USA |
| Pseudocode | No | The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to using existing open-source models like MPNet ('We use All-MPNetbase-v25 as the encoder... and we use the publically available model checkpoint') but does not provide an explicit statement or link to the source code for the Ki C model itself. |
| Open Datasets | Yes | We adopt the same setting as T0 (Sanh et al., 2022), where we train Ki C models on a collection of tasks and then evaluate on another set of unseen tasks in a zero-shot manner. ... We train our Ki C model on a mixture of multiple tasks (39 tasks in total) by combining and shuffling all training instances from different tasks (8.4M in total). |
| Dataset Splits | Yes | Following standard approaches, we choose the prompt that yields the best accuracy (%) on the validation set. ... we reproduce T0Large with the same collection of tasks and evaluate Ki CLarge on the validation set of each in-domain task (Table 4). |
| Hardware Specification | Yes | Our final Ki CLarge model is trained with 128 V100 GPUs for 42 hours. |
| Software Dependencies | No | The paper mentions software like T5, MPNet, and SCaNN, but does not provide specific version numbers for these or other underlying software libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The hyper-parameters of learning Ki CBase and Ki CLarge are listed in Table 9. In addition, we also list the hyper-parameters of single-task finetuning used in Table 10. Table 9 includes: Learning Rate, Max. Input Length, Max. Output Length, Batch Size, α, # epoch, Max. Knowledge Pieces. |