reproducibilityindex.ai

Co-occurrence is not Factual Association in Language Models

Authors: Xiao Zhang, Miao Li, Ji Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On both synthetic and real-world corpora, the two proposed strategies improve the generalization of the knowledge learned during finetuning to reasoning scenarios such as indirect and multi-hop question answering.
Researcher Affiliation	Academia	Xiao Zhang Department of Electronics Engineering Tsinghua University xzhang19@mails.tsinghua.edu.cn Miao Li Department of Electronics Engineering Tsinghua University miao-li@tsinghua.edu.cn Ji Wu Department of Electronics Engineering, College of AI Tsinghua University Beijing National Research Center for Information Science and Technology Center for Big Data and Clinical Research, Institute for Precision Medicine Tsinghua University wuji_ee@mail.tsinghua.edu.cn
Pseudocode	No	The paper describes methods and processes in narrative text and figures, but does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	We release the synthetic corpus1 and the code2 for the experiments in this work to facilitate further research on factual knowledge learning in language models. 2https://github.com/xiaozeroone/fact_learning
Open Datasets	Yes	We create a synthetic knowledge dataset called Country-city-animals... 1https://huggingface.co/datasets/xiaozeroone/Country-city-animals
Dataset Splits	No	The paper describes the datasets used and evaluation metrics, but it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits. It mentions "5-shot accuracies" for evaluation, which refers to the few-shot prompting setup, not dataset partitioning.
Hardware Specification	Yes	All experiments on LLa MA 3 8B and Gemma 7B are performed on a single NVIDIA A100 GPU with 80 GB memory. Experiments on LLa MA 3 70B are performed on 3 NVIDIA A100 GPUs with 80 GB memory.
Software Dependencies	No	The paper mentions the use of Huggingface Transformers, PEFT, and Eleuther AI lm-evaluation-harness libraries, but it does not provide specific version numbers for these software components or Python.
Experiment Setup	Yes	We use Adam optimizer with a batch size of 16. The learning rate and number of epochs are selected via a grid search... Linear learning rate decay is used with 10% warmup steps. The range of the hyperparameter search is as follows: Learning rate (full model finetune): 1e-5, 2e-5, 5e-5 Learning rate (low-rank finetune): 1e-4, 2e-4, 5e-4 Number of epochs: 3, 5, 10, 20. For low-rank (Lo RA) finetuning, we use rank r = 64 and α = 16.