Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions

Authors: Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu Sun, Yang Yang, David Clifton

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that, using our method, the performance of five different models can be substantially boosted across all metrics, with up to 20%, 11% and 19% relative improvements in BLEU-4, ROUGE-L and METEOR, respectively. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of its usefulness for clinical practice.
Researcher Affiliation Collaboration 1Department of Engineering Science, University of Oxford 2School of ECE, Peking University 3Department of Electrical Engineering, Yale University 4Tencent JARVIS Lab, China 5Oxford-Suzhou Centre for Advanced Research, China 6MOE Key Lab of Computational Linguistics, School of Computer Science, Peking University 7School of Public Health, Shanghai Jiao Tong University School of Medicine, China
Pseudocode No The paper describes the method using text and diagrams (Figure 2), but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 3The code is available at https://github.com/AI-in-Health/Patient-Instructions
Open Datasets Yes In detail, we collect the PI dataset from the publicly-accessible MIMIC-III v1.4 resource4[20, 19]
Dataset Splits Yes We randomly partition the dataset into 80%-10%10% train-validation-test partitions according to patients.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU models, CPU types) used for the experiments. It only mentions 'The model size d is set to 512.'
Software Dependencies No The paper mentions models and optimizers used (e.g., Transformer, Adam optimizer, BERT encoder) but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes The model size d is set to 512. For a patient, we directly concatenate all available patient s health records during hospitalization as input... We adopt Transformer [47] as the record encoder. Based on the average performance on the validation set, the number of retrieved previous PIs, NP, is set to 20 for all three codes (see Appendix B). During training, we use the Adam optimizer [21] with a batch size of 128 and a learning rate of 10 4 for parameter optimization. We perform early stopping based on BLEU-4 with a maximum 150 epochs. During testing, we apply beam search of size 2 and a repetition penalty of 2.5.