QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters

Authors: Yushan Liu, Zili Wang, Ruifeng Yuan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of Query Sum dataset using the existing summarization models for a better understanding of the dataset.
Researcher Affiliation Collaboration Yushan Liu1, Zili Wang2*, Ruifeng Yuan3 1Fudan University 2INF Technology (Shanghai) Co., Ltd. 3The Hong Kong Polytechnic University yushanliu21@m.fudan.edu.cn, ziliwang.do@gmail.com, ruifeng.yuan@connect.polyu.hk
Pseudocode No The paper describes the model architecture and its components but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper states, 'Our dataset is available on github2. 2https://github.com/613lys/Query Sum'. This link is for the dataset, not the source code for the proposed model or methodology.
Open Datasets Yes We build a new large-scale query-focused multi-document summarization dataset called Query Sum. ... Our dataset is available on github2. 2https://github.com/613lys/Query Sum
Dataset Splits Yes Following the previous work, We randomly extract 15% of the data samples as the validation set and another 15% as the test set.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or cloud computing instances.
Software Dependencies No The paper mentions using PEGASUS-LARGE/BASE and Adam optimizer, but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes ALL the hyperparameters are adjusted on the development set. For optimization, the batch size is set to 16. We use dropout with the probability of 0.1 and label smoothing (Szegedy et al. 2015) with smoothing factor 0.1. The optimizer is Adam (Kingma and Ba 2014) with a learning rate of 0.001. In addition, we apply warm-up with the first 10% steps, and learning rate decay of 0.95.