Improving Biomedical Information Retrieval with Neural Retrievers
Authors: Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral11038-11046
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and analysis on the Bio ASQ challenge suggest that our proposed method leads to large gains over existing neural approaches and beats BM25 in the small-corpus setting. |
| Researcher Affiliation | Collaboration | Man Luo,1 Arindam Mitra,2 Tejas Gokhale,1 Chitta Baral1 1 Arizona State University 2 Microsoft mluo26@asu.edu, arindam.mitra2@gmail.com, tgokhale@asu.edu, chitta@asu.edu |
| Pseudocode | No | The paper includes figures illustrating processes and mathematical formulas but does not provide any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/luomancs/neural_retrieval_ for_biomedical_domain.git |
| Open Datasets | Yes | Dataset. We focus on the document retrieval task in Bio ASQ8 (Tsatsaronis et al. 2015) with a goal of retrieving a list of relevant documents to a question. This dataset contains 3234 questions in the training set and five test sets (B1, B2, B3, B4, B5) with 100 questions each. |
| Dataset Splits | No | This dataset contains 3234 questions in the training set and five test sets (B1, B2, B3, B4, B5) with 100 questions each. The paper does not explicitly mention a validation split for the Bio ASQ dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud computing instance types) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using 'Pyserini (Lin et al. 2021)', 'Bio BERT (Lee et al. 2020)', and 'T5 (Raffel et al. 2020)' but does not provide specific version numbers for these or other ancillary software components (e.g., Python, PyTorch). |
| Experiment Setup | Yes | For Poly-DPR, the number of representations K is set as 6 after a hyper-parameter search. While larger values of K improve results, it makes indexing slower 1. For BM25, we use an implementation from Pyserini (Lin et al. 2021) with default hyperparameters k=0.9 and b=0.4. |