A Semantic QA-Based Approach for Text Summarization Evaluation
Authors: Ping Chen, Fei Wu, Tong Wang, Wei Ding
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiment using 2007 DUC summarization corpus clearly shows promising results.Our experiments on text summarization evaluation showed promising results. |
| Researcher Affiliation | Academia | Ping Chen, Fei Wu, Tong Wang, Wei Ding University of Massachusetts Boston ping.chen@umb.edu |
| Pseudocode | No | The paper describes the system architecture and process flow with diagrams but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions utilizing existing open-source frameworks (Open Ephyra) but does not state that their own developed code for the methodology is open-source or provide a link to it. |
| Open Datasets | Yes | To test our prototype system, we use the corpus from Document Understanding Conference (DUC) 2007. |
| Dataset Splits | No | The paper uses the DUC 2007 corpus for evaluation, but as their system is an evaluation framework rather than a trainable model, it does not specify traditional training/validation/test dataset splits for model training. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions adapting 'Heilman, M. (2011) s QG system' and using 'Open Ephyra' QA framework, but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | To make sure the answers generated by the QA system are mostly correct, we set the confidence score of the answers to a very high value. The confidence threshold was set to a very high value (0.8 or 0.9 as shown in Table 1) to ensure correctness of generated answers and to minimize possibility of generating false answers. All questions were limited to WH factoid questions that are shorter than a certain threshold. The average number of questions generated for a topic with 25 documents is 1193 when setting the question length to be less than 20 words. |