Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Understanding Factual Knowledge of Large Language Models
Authors: Xuming Hu, Junzhe Chen, Xiaochuan Li, Yufei Guo, Lijie Wen, Philip S. Yu, Zhijiang Guo
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on different sizes and types of LLMs show that existing LLMs still lack factual knowledge and suffer from various spurious correlations. |
| Researcher Affiliation | Academia | 1 Tsinghua University 2 The Hong Kong University of Science and Technology (Guangzhou) 3 University of Illinois at Chicago 4 University of Cambridge |
| Pseudocode | No | The paper describes methods in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The dataset Pinocchio and our codes are publicly available at: https://github.com/THU-BPM/Pinocchio. |
| Open Datasets | Yes | The dataset Pinocchio and our codes are publicly available at: https://github.com/THU-BPM/Pinocchio. |
| Dataset Splits | No | The paper describes the Pinocchio dataset and its subsets but does not explicitly state specific training/validation/test dataset splits (e.g., percentages or counts) within the main text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers, used to replicate the experiments. |
| Experiment Setup | No | While the paper describes various prompt strategies used in experiments, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size) or detailed system-level training settings. |