reproducibilityindex.ai

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Co K consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
Researcher Affiliation	Collaboration	1DAMO Academy, Alibaba Group, Singapore, 2Nanyang Technological University, 3Singapore University of Technology and Design, 4Salesforce Research, 5Hupan Lab, 310023, Hangzhou, China
Pseudocode	No	The paper describes the framework components and steps in detail using text and a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/DAMO-NLP-SG/chain-of-knowledge.
Open Datasets	Yes	We collect a set of knowledge-intensive tasks from various domains, including FEVER (Thorne et al., 2018), Hotpot QA (Yang et al., 2018), and Fe Ta QA (Nan et al., 2022) in the factual domain; Med MCQA (Pal et al., 2022) in the medical domain; Physics and Biology tests from MMLU (Hendrycks et al., 2021) in the physics and biology domains. Details are in Appendix E.
Dataset Splits	Yes	The size of the dataset is provided in Table 8. (Referring to instruction-tuning datasets). Table 8 itself shows 'Train. Set' and 'Eval. Set' columns with numerical values for various datasets (e.g., LC-quad & KQA-pro: 19,010 Train, 4,779 Eval).
Hardware Specification	Yes	For each knowledge source, the model is trained for 3 epochs, utilizing an NVIDIA A40 GPU.
Software Dependencies	Yes	We employ Llama-2 (meta-llama/Llama-2-7b-hf) as the base model. We utilize Lo RA for parameter-efficient fine-tuning, and load the weights in 8-bit format.
Experiment Setup	Yes	Except for the self-consistency step, we set the temperature to 0.7, allowing for the sampling of five rationales and answers, as recommended by Wang et al. (2023).For each knowledge source, the model is trained for 3 epochs, utilizing an NVIDIA A40 GPU. We maintain a training batch size of 32, with a gradient accumulation step set at 2.