Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Co K consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
Researcher Affiliation Collaboration 1DAMO Academy, Alibaba Group, Singapore, 2Nanyang Technological University, 3Singapore University of Technology and Design, 4Salesforce Research, 5Hupan Lab, 310023, Hangzhou, China
Pseudocode No The paper describes the framework components and steps in detail using text and a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/DAMO-NLP-SG/chain-of-knowledge.
Open Datasets Yes We collect a set of knowledge-intensive tasks from various domains, including FEVER (Thorne et al., 2018), Hotpot QA (Yang et al., 2018), and Fe Ta QA (Nan et al., 2022) in the factual domain; Med MCQA (Pal et al., 2022) in the medical domain; Physics and Biology tests from MMLU (Hendrycks et al., 2021) in the physics and biology domains. Details are in Appendix E.
Dataset Splits Yes The size of the dataset is provided in Table 8. (Referring to instruction-tuning datasets). Table 8 itself shows 'Train. Set' and 'Eval. Set' columns with numerical values for various datasets (e.g., LC-quad & KQA-pro: 19,010 Train, 4,779 Eval).
Hardware Specification Yes For each knowledge source, the model is trained for 3 epochs, utilizing an NVIDIA A40 GPU.
Software Dependencies Yes We employ Llama-2 (meta-llama/Llama-2-7b-hf) as the base model. We utilize Lo RA for parameter-efficient fine-tuning, and load the weights in 8-bit format.
Experiment Setup Yes Except for the self-consistency step, we set the temperature to 0.7, allowing for the sampling of five rationales and answers, as recommended by Wang et al. (2023).For each knowledge source, the model is trained for 3 epochs, utilizing an NVIDIA A40 GPU. We maintain a training batch size of 32, with a gradient accumulation step set at 2.