Automated Lay Language Summarization of Biomedical Scientific Reviews
Authors: Yue Guo, Wei Qiu, Yizhong Wang, Trevor Cohen160-168
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct analyses of the various challenges in performing this task, including not only summarization of the key points but also explanation of background knowledge and simplification of professional language. We experiment with state-of-the-art summarization models as well as several data augmentation techniques, and evaluate their performance using both automated metrics and human assessment. Results indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with reference summaries developed for the lay public by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability score of 13.30). |
| Researcher Affiliation | Academia | Yue Guo, 1 Wei Qiu, 2 Yizhong Wang, 2 Trevor Cohen1 1 Biomedical and Health Informatics, University of Washington 2 Paul G. Allen School of Computer Science, University of Washington |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We release our code at https://github.com/qiuweipku/Plain language summarization |
| Open Datasets | Yes | We extracted 7,805 abstracts (source), paired with their plain language versions (target) from CDSR reviews available up to March 29, 2020. The original data is downloadable via the official API 4. https://www.cochranelibrary.com/cdsr/reviews |
| Dataset Splits | Yes | This resulted in a set of 5,195 source-target pairs which constitutes our training set, a further 500 abstract pairs as the validation set, and 1000 more as the test set. |
| Hardware Specification | Yes | All experiments were run using a single NVIDIA Tesla V-100 GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch, Fairseq, neural-summ-cnndm-pytorch, and BERT extractive code, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The batch size was set to 4. Other hyper-parameters were set to default values. We built the BERT extractive model using code released by the authors. The learning rate was set to 2 10 3 and the batch size 140. Other hyper-parameters were set to default values. We used the Fairseq 11 BART implementation. All BART models were trained using the Adam optimizer. The learning rate was set to 3 10 5, and learning decay was applied. The minimum length of the generated summaries was set to 100, and the maximum length was set to 700. |