reproducibilityindex.ai

Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions

Authors: Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu Sun, Yang Yang, David Clifton

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that, using our method, the performance of five different models can be substantially boosted across all metrics, with up to 20%, 11% and 19% relative improvements in BLEU-4, ROUGE-L and METEOR, respectively. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of its usefulness for clinical practice.
Researcher Affiliation	Collaboration	1Department of Engineering Science, University of Oxford 2School of ECE, Peking University 3Department of Electrical Engineering, Yale University 4Tencent JARVIS Lab, China 5Oxford-Suzhou Centre for Advanced Research, China 6MOE Key Lab of Computational Linguistics, School of Computer Science, Peking University 7School of Public Health, Shanghai Jiao Tong University School of Medicine, China
Pseudocode	No	The paper describes the method using text and diagrams (Figure 2), but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	3The code is available at https://github.com/AI-in-Health/Patient-Instructions
Open Datasets	Yes	In detail, we collect the PI dataset from the publicly-accessible MIMIC-III v1.4 resource4[20, 19]
Dataset Splits	Yes	We randomly partition the dataset into 80%-10%10% train-validation-test partitions according to patients.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types) used for the experiments. It only mentions 'The model size d is set to 512.'
Software Dependencies	No	The paper mentions models and optimizers used (e.g., Transformer, Adam optimizer, BERT encoder) but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	The model size d is set to 512. For a patient, we directly concatenate all available patient s health records during hospitalization as input... We adopt Transformer [47] as the record encoder. Based on the average performance on the validation set, the number of retrieved previous PIs, NP, is set to 20 for all three codes (see Appendix B). During training, we use the Adam optimizer [21] with a batch size of 128 and a learning rate of 10 4 for parameter optimization. We perform early stopping based on BLEU-4 with a maximum 150 epochs. During testing, we apply beam search of size 2 and a repetition penalty of 2.5.