A Semantic QA-Based Approach for Text Summarization Evaluation

Authors: Ping Chen, Fei Wu, Tong Wang, Wei Ding

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiment using 2007 DUC summarization corpus clearly shows promising results.Our experiments on text summarization evaluation showed promising results.
Researcher Affiliation Academia Ping Chen, Fei Wu, Tong Wang, Wei Ding University of Massachusetts Boston ping.chen@umb.edu
Pseudocode No The paper describes the system architecture and process flow with diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper mentions utilizing existing open-source frameworks (Open Ephyra) but does not state that their own developed code for the methodology is open-source or provide a link to it.
Open Datasets Yes To test our prototype system, we use the corpus from Document Understanding Conference (DUC) 2007.
Dataset Splits No The paper uses the DUC 2007 corpus for evaluation, but as their system is an evaluation framework rather than a trainable model, it does not specify traditional training/validation/test dataset splits for model training.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions adapting 'Heilman, M. (2011) s QG system' and using 'Open Ephyra' QA framework, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes To make sure the answers generated by the QA system are mostly correct, we set the confidence score of the answers to a very high value. The confidence threshold was set to a very high value (0.8 or 0.9 as shown in Table 1) to ensure correctness of generated answers and to minimize possibility of generating false answers. All questions were limited to WH factoid questions that are shorter than a certain threshold. The average number of questions generated for a topic with 25 documents is 1193 when setting the question length to be less than 20 words.