reproducibilityindex.ai

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Authors: Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability.
Researcher Affiliation	Academia	1University of California, Santa Barbara 2Chinese Academy of Medical Sciences and Peking Union Medical College
Pseudocode	No	No explicit pseudocode or algorithm blocks were found; methods are described in prose and diagrams.
Open Source Code	Yes	Our code can be found at https: //github.com/XZhang97666/Privacy Boost-SLM. Our codes and generated data are public at https://github.com/XZhang97666/ Privacy Boost-SLM.
Open Datasets	Yes	Med QA (Jin et al., 2020) contains 4-way multiplechoice questions from the US Medical Licensing Exam. It has 10,178/1,272/1,273 instances in the training/development/test sets. Results on the development and test sets are reported.
Dataset Splits	Yes	Med QA (Jin et al., 2020) contains 4-way multiplechoice questions from the US Medical Licensing Exam. It has 10,178/1,272/1,273 instances in the training/development/test sets.
Hardware Specification	Yes	We implement both SFT and FTC based on huggingface transformers Wolf et al. (2020), and train on NVIDIA A40-48GB GPUs.
Software Dependencies	No	The paper mentions software like 'huggingface transformers' and 'Adam W' but does not specify their version numbers to allow for reproducible software setup.
Experiment Setup	Yes	For all datasets, we utilize Adam W (Loshchilov and Hutter, 2019) as optimizer. For Med QA and HEADQA, we set learning rates of 5 × 10−5, 5 × 10−5, and 2 × 10−6 for Bio Link BERT-Base, Bio Link BERT-Large, and Bio Med LM in both FTC and SFT settings. For Med MCQA, we set learning rates of 2 × 10−5, 2 × 10−5, and 2 × 10−6 for Bio Link BERT-Base, Bio Link BERT-Large, and Bio Med LM in both FTC and SFT settings. For Bio Link BERT-Base and Bio Link BERT-Large, we limit training to 100 epochs with a 200-step warm-up and apply early stopping after 5 epochs without validation improvement. Batch sizes are 8 for few-shot and full-training scenarios across all datasets. For Bio Med LM, we set the training epochs to 10 for all datasets.