On the Generation of Medical Question-Answer Pairs
Authors: Sheng Shen, Yaliang Li, Nan Du, Xian Wu, Yusheng Xie, Shen Ge, Tao Yang, Kai Wang, Xingzheng Liang, Wei Fan8822-8829
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system. |
| Researcher Affiliation | Collaboration | Sheng Shen,1 Yaliang Li,2 Nan Du,3 Xian Wu,3 Yusheng Xie,3 Shen Ge,3 Tao Yang,3 Kai Wang,3 Xingzheng Liang,3 Wei Fan3 1University of California at Berkeley, 2Alibaba Group, 3Tencent |
| Pseudocode | No | The paper describes its methods textually in the 'Methodology' section but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'Our full version paper with supplemented material is publicly available at https://arxiv.org/abs/1811.00681,' but this link is to the paper itself, not a code repository. There is no explicit statement about releasing source code for the described methodology. |
| Open Datasets | No | The paper states, 'We collect real-world medical QA pairs from the National Medical Licensing Examination of China (denoted as NMLEC QA).' It mentions a medical entity dictionary from 'medical Wikipedia-style pages3' with a URL (xywy.com), but this is not a dataset repository, and the NMLEC QA dataset is collected by the authors without a provided access method. |
| Dataset Splits | No | The paper mentions using 'NMLEC 2017 as the test set,' but does not provide specific percentages or counts for training, validation, and test splits, nor does it refer to standard predefined splits for the dataset collected. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for conducting the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Elasticsearch2' and 'Bi-LSTM-CRF model' but does not provide specific version numbers for these or any other software components, which is necessary for reproducible setup. |
| Experiment Setup | No | The paper describes the model architecture and evaluation metrics but does not provide specific details about the experimental setup such as hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings. |