reproducibilityindex.ai

Automated Lay Language Summarization of Biomedical Scientific Reviews

Authors: Yue Guo, Wei Qiu, Yizhong Wang, Trevor Cohen160-168

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct analyses of the various challenges in performing this task, including not only summarization of the key points but also explanation of background knowledge and simpliﬁcation of professional language. We experiment with state-of-the-art summarization models as well as several data augmentation techniques, and evaluate their performance using both automated metrics and human assessment. Results indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with reference summaries developed for the lay public by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability score of 13.30).
Researcher Affiliation	Academia	Yue Guo, 1 Wei Qiu, 2 Yizhong Wang, 2 Trevor Cohen1 1 Biomedical and Health Informatics, University of Washington 2 Paul G. Allen School of Computer Science, University of Washington
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	We release our code at https://github.com/qiuweipku/Plain language summarization
Open Datasets	Yes	We extracted 7,805 abstracts (source), paired with their plain language versions (target) from CDSR reviews available up to March 29, 2020. The original data is downloadable via the ofﬁcial API 4. https://www.cochranelibrary.com/cdsr/reviews
Dataset Splits	Yes	This resulted in a set of 5,195 source-target pairs which constitutes our training set, a further 500 abstract pairs as the validation set, and 1000 more as the test set.
Hardware Specification	Yes	All experiments were run using a single NVIDIA Tesla V-100 GPU.
Software Dependencies	No	The paper mentions software like PyTorch, Fairseq, neural-summ-cnndm-pytorch, and BERT extractive code, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The batch size was set to 4. Other hyper-parameters were set to default values. We built the BERT extractive model using code released by the authors. The learning rate was set to 2 10 3 and the batch size 140. Other hyper-parameters were set to default values. We used the Fairseq 11 BART implementation. All BART models were trained using the Adam optimizer. The learning rate was set to 3 10 5, and learning decay was applied. The minimum length of the generated summaries was set to 100, and the maximum length was set to 700.