BAND: Biomedical Alert News Dataset
Authors: Zihao Fu, Meiru Zhang, Zaiqiao Meng, Yannan Shen, David Buckeridge, Nigel Collier
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to demonstrate existing models capabilities and limitations in handling epidemiology-specific tasks. It is worth noting that some models may lack the human-like inference capability required to fully utilize the corpus. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike. |
| Researcher Affiliation | Academia | 1Language Technology Lab, University of Cambridge 2School of Computing Science, University of Glasgow 3School of Population and Global Health, Mc Gill University |
| Pseudocode | No | The paper does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our dataset and code are available at https://github.com/fuzihaofzh/BAND |
| Open Datasets | Yes | Our dataset and code are available at https://github.com/fuzihaofzh/BAND |
| Dataset Splits | Yes | We provide two different sampled splits, namely the Rand Split and the Stratified Split, as shown in Table 2. Rand Split. This split randomly partitions the corpus into train/dev/test sets, without considering any other factors. Stratified Split. In order to assess the model s ability to accurately answer sparse questions with limited positive answers, it is crucial to focus on these specific samples in upcoming research. To accomplish this, we employ a split strategy that prioritizes samples with positive answers for sparse questions. These samples are divided in a ratio of 5:1:4 for the train/dev/test sets respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software tools and models like “Label Studio” and various LLMs (T5, Bart, GPT2, etc.), but it does not specify version numbers for these software dependencies or any programming languages/libraries required for reproduction. |
| Experiment Setup | No | The paper mentions fine-tuning models and evaluation metrics, but it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings. |