Retrieval is Accurate Generation
Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. We verify the effectiveness of our methods on a set of knowledge-intensive tasks and open-ended text generation tasks without fine-tuning. |
| Researcher Affiliation | Collaboration | Bowen Cao , Deng Cai , Leyang Cui Xuxin Cheng Wei Bi Yuexian Zou Shuming Shi School of ECE, Peking University Tencent AI Lab {cbw2021,chengxx}@stu.pku.edu.cn, zouyx@pku.edu.cn thisisjcykcd@gmail.com, {leyangcui,victoriabi,shumingshi}@tencent.com |
| Pseudocode | No | The paper describes processes like the 'bootstrapping algorithm' in text but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to its repository. It only cites and links to external, pre-existing models like GPT-2 and Dense Phrases. |
| Open Datasets | Yes | We train our model on the training set of Mini Pile2(Kaddour, 2023), and use the English Wikipedia dump March 1, 20223 as supporting documents. 2https://huggingface.co/datasets/Jean Kaddour/minipile 3https://huggingface.co/datasets/wikipedia and 8https://huggingface.co/datasets/gamino/wiki medical terms |
| Dataset Splits | Yes | We train our model on the training set of Mini Pile2(Kaddour, 2023)... We conduct open-ended text generation experiments on the test set of Mini Pile (Kaddour, 2023)... Med MCQA (Pal et al., 2022) is a comprehensive, high-quality dataset designed for biomedical question-answering. We use its validation split, which consists of 4,183 questions. |
| Hardware Specification | Yes | The entire preprocessing process, including syntactic parsing, phrase selection, and semantic matching, takes approximately 24 hours on 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions tools like 'Stanford Parser' (via a link to Stanza) and 'FAISS', but it does not specify concrete version numbers for these or any other software dependencies required for reproducibility. |
| Experiment Setup | Yes | While revising the training oracles via self-reinforcement, we retrieve the top k = 128 phrases for each prefix. In all experiments, we set k to 128 (see the analysis on k in Table 7 in Appendix G) and p to 0.95. To control the ratio of phrase retrieval, we filter out phrases with probabilities below a threshold. The threshold is set to ϕ = 0.4 if not otherwise specified. |