Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CFEVER: A Chinese Fact Extraction and VERification Dataset
Authors: Ying-Jia Lin, Chun-Yi Lin, Chia-Jen Yeh, Yi-Ting Li, Yun-Yu Hu, Chih-Hao Hsu, Mei-Feng Lee, Hung-Yu Kao
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, through the experiments with the state-of-the-art approaches developed on the FEVER dataset and a simple baseline for CFEVER, we demonstrate that our dataset is a new rigorous benchmark for factual extraction and verification, which can be further used for developing automated systems to alleviate human fact-checking efforts. |
| Researcher Affiliation | Academia | Ying-Jia Lin, Chun-Yi Lin, Chia-Jen Yeh, Yi-Ting Li, Yun-Yu Hu, Chih-Hao Hsu, Mei-Feng Lee, Hung-Yu Kao Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan EMAIL, EMAIL |
| Pseudocode | No | The paper describes various algorithms and approaches (e.g., BM25, BERT-based methods) but does not provide them in pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | CFEVER is available at https://ikmlab.github.io/CFEVER. (This link leads to a website which provides a link to the code repository for the dataset and baselines.) |
| Open Datasets | Yes | We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER is available at https://ikmlab.github.io/CFEVER. |
| Dataset Splits | Yes | There are 30,012 claims in the CFEVER dataset. We split 80%, 10%, and 10% of the claims into the training, development, and test sets, respectively. The statistics of the dataset are shown in Table 3. |
| Hardware Specification | No | The paper mentions models used (e.g., BERT, GPT-3.5, GPT-4) and links to some pre-trained model repositories, but it does not specify the hardware (e.g., specific GPU or CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like 'Open CC' and 'Elasticsearch' and models like 'BERT' and 'GPT-3.5', but it does not provide specific version numbers for these software dependencies or libraries. |
| Experiment Setup | No | The paper describes the overall setup of the baseline systems, including component usage (e.g., fine-tuning BERT, concatenating top five evidence sentences), but it does not provide specific numerical hyperparameter values such as learning rate, batch size, or number of epochs. |