reproducibilityindex.ai

Improving Biomedical Information Retrieval with Neural Retrievers

Authors: Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral11038-11046

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and analysis on the Bio ASQ challenge suggest that our proposed method leads to large gains over existing neural approaches and beats BM25 in the small-corpus setting.
Researcher Affiliation	Collaboration	Man Luo,1 Arindam Mitra,2 Tejas Gokhale,1 Chitta Baral1 1 Arizona State University 2 Microsoft mluo26@asu.edu, arindam.mitra2@gmail.com, tgokhale@asu.edu, chitta@asu.edu
Pseudocode	No	The paper includes figures illustrating processes and mathematical formulas but does not provide any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/luomancs/neural_retrieval_ for_biomedical_domain.git
Open Datasets	Yes	Dataset. We focus on the document retrieval task in Bio ASQ8 (Tsatsaronis et al. 2015) with a goal of retrieving a list of relevant documents to a question. This dataset contains 3234 questions in the training set and ﬁve test sets (B1, B2, B3, B4, B5) with 100 questions each.
Dataset Splits	No	This dataset contains 3234 questions in the training set and ﬁve test sets (B1, B2, B3, B4, B5) with 100 questions each. The paper does not explicitly mention a validation split for the Bio ASQ dataset.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud computing instance types) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper mentions using 'Pyserini (Lin et al. 2021)', 'Bio BERT (Lee et al. 2020)', and 'T5 (Raffel et al. 2020)' but does not provide specific version numbers for these or other ancillary software components (e.g., Python, PyTorch).
Experiment Setup	Yes	For Poly-DPR, the number of representations K is set as 6 after a hyper-parameter search. While larger values of K improve results, it makes indexing slower 1. For BM25, we use an implementation from Pyserini (Lin et al. 2021) with default hyperparameters k=0.9 and b=0.4.