Improving Biomedical Information Retrieval with Neural Retrievers

Authors: Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral11038-11046

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and analysis on the Bio ASQ challenge suggest that our proposed method leads to large gains over existing neural approaches and beats BM25 in the small-corpus setting.
Researcher Affiliation Collaboration Man Luo,1 Arindam Mitra,2 Tejas Gokhale,1 Chitta Baral1 1 Arizona State University 2 Microsoft mluo26@asu.edu, arindam.mitra2@gmail.com, tgokhale@asu.edu, chitta@asu.edu
Pseudocode No The paper includes figures illustrating processes and mathematical formulas but does not provide any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/luomancs/neural_retrieval_ for_biomedical_domain.git
Open Datasets Yes Dataset. We focus on the document retrieval task in Bio ASQ8 (Tsatsaronis et al. 2015) with a goal of retrieving a list of relevant documents to a question. This dataset contains 3234 questions in the training set and five test sets (B1, B2, B3, B4, B5) with 100 questions each.
Dataset Splits No This dataset contains 3234 questions in the training set and five test sets (B1, B2, B3, B4, B5) with 100 questions each. The paper does not explicitly mention a validation split for the Bio ASQ dataset.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud computing instance types) used for running the experiments are provided in the paper.
Software Dependencies No The paper mentions using 'Pyserini (Lin et al. 2021)', 'Bio BERT (Lee et al. 2020)', and 'T5 (Raffel et al. 2020)' but does not provide specific version numbers for these or other ancillary software components (e.g., Python, PyTorch).
Experiment Setup Yes For Poly-DPR, the number of representations K is set as 6 after a hyper-parameter search. While larger values of K improve results, it makes indexing slower 1. For BM25, we use an implementation from Pyserini (Lin et al. 2021) with default hyperparameters k=0.9 and b=0.4.