reproducibilityindex.ai

BAND: Biomedical Alert News Dataset

Authors: Zihao Fu, Meiru Zhang, Zaiqiao Meng, Yannan Shen, David Buckeridge, Nigel Collier

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to demonstrate existing models capabilities and limitations in handling epidemiology-specific tasks. It is worth noting that some models may lack the human-like inference capability required to fully utilize the corpus. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.
Researcher Affiliation	Academia	1Language Technology Lab, University of Cambridge 2School of Computing Science, University of Glasgow 3School of Population and Global Health, Mc Gill University
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our dataset and code are available at https://github.com/fuzihaofzh/BAND
Open Datasets	Yes	Our dataset and code are available at https://github.com/fuzihaofzh/BAND
Dataset Splits	Yes	We provide two different sampled splits, namely the Rand Split and the Stratified Split, as shown in Table 2. Rand Split. This split randomly partitions the corpus into train/dev/test sets, without considering any other factors. Stratified Split. In order to assess the model s ability to accurately answer sparse questions with limited positive answers, it is crucial to focus on these specific samples in upcoming research. To accomplish this, we employ a split strategy that prioritizes samples with positive answers for sparse questions. These samples are divided in a ratio of 5:1:4 for the train/dev/test sets respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software tools and models like “Label Studio” and various LLMs (T5, Bart, GPT2, etc.), but it does not specify version numbers for these software dependencies or any programming languages/libraries required for reproduction.
Experiment Setup	No	The paper mentions fine-tuning models and evaluation metrics, but it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings.