Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Authors: Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability.
Researcher Affiliation Academia 1University of California, Santa Barbara 2Chinese Academy of Medical Sciences and Peking Union Medical College
Pseudocode No No explicit pseudocode or algorithm blocks were found; methods are described in prose and diagrams.
Open Source Code Yes Our code can be found at https: //github.com/XZhang97666/Privacy Boost-SLM. Our codes and generated data are public at https://github.com/XZhang97666/ Privacy Boost-SLM.
Open Datasets Yes Med QA (Jin et al., 2020) contains 4-way multiplechoice questions from the US Medical Licensing Exam. It has 10,178/1,272/1,273 instances in the training/development/test sets. Results on the development and test sets are reported.
Dataset Splits Yes Med QA (Jin et al., 2020) contains 4-way multiplechoice questions from the US Medical Licensing Exam. It has 10,178/1,272/1,273 instances in the training/development/test sets.
Hardware Specification Yes We implement both SFT and FTC based on huggingface transformers Wolf et al. (2020), and train on NVIDIA A40-48GB GPUs.
Software Dependencies No The paper mentions software like 'huggingface transformers' and 'Adam W' but does not specify their version numbers to allow for reproducible software setup.
Experiment Setup Yes For all datasets, we utilize Adam W (Loshchilov and Hutter, 2019) as optimizer. For Med QA and HEADQA, we set learning rates of 5 × 10−5, 5 × 10−5, and 2 × 10−6 for Bio Link BERT-Base, Bio Link BERT-Large, and Bio Med LM in both FTC and SFT settings. For Med MCQA, we set learning rates of 2 × 10−5, 2 × 10−5, and 2 × 10−6 for Bio Link BERT-Base, Bio Link BERT-Large, and Bio Med LM in both FTC and SFT settings. For Bio Link BERT-Base and Bio Link BERT-Large, we limit training to 100 epochs with a 200-step warm-up and apply early stopping after 5 epochs without validation improvement. Batch sizes are 8 for few-shot and full-training scenarios across all datasets. For Bio Med LM, we set the training epochs to 10 for all datasets.