QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters
Authors: Yushan Liu, Zili Wang, Ruifeng Yuan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of Query Sum dataset using the existing summarization models for a better understanding of the dataset. |
| Researcher Affiliation | Collaboration | Yushan Liu1, Zili Wang2*, Ruifeng Yuan3 1Fudan University 2INF Technology (Shanghai) Co., Ltd. 3The Hong Kong Polytechnic University yushanliu21@m.fudan.edu.cn, ziliwang.do@gmail.com, ruifeng.yuan@connect.polyu.hk |
| Pseudocode | No | The paper describes the model architecture and its components but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper states, 'Our dataset is available on github2. 2https://github.com/613lys/Query Sum'. This link is for the dataset, not the source code for the proposed model or methodology. |
| Open Datasets | Yes | We build a new large-scale query-focused multi-document summarization dataset called Query Sum. ... Our dataset is available on github2. 2https://github.com/613lys/Query Sum |
| Dataset Splits | Yes | Following the previous work, We randomly extract 15% of the data samples as the validation set and another 15% as the test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using PEGASUS-LARGE/BASE and Adam optimizer, but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | ALL the hyperparameters are adjusted on the development set. For optimization, the batch size is set to 16. We use dropout with the probability of 0.1 and label smoothing (Szegedy et al. 2015) with smoothing factor 0.1. The optimizer is Adam (Kingma and Ba 2014) with a learning rate of 0.001. In addition, we apply warm-up with the first 10% steps, and learning rate decay of 0.95. |