TGSum: Build Tweet Guided Multi-Document Summarization Dataset
Authors: Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, Ming Zhou
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both informativeness and readability of the collected summaries are veriļ¬ed by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. |
| Researcher Affiliation | Collaboration | 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong 2Key Laboratory of Computational Linguistics, Peking University, MOE, China 3Microsoft Research, Beijing, China |
| Pseudocode | No | The paper provides mathematical formulations and constraints for the ILP solution but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "We release this dataset for further research1," with footnote 1 pointing to "http://www4.comp.polyu.edu.hk/ cszqcao/". This explicitly refers to the *dataset* being released, not the source code for the methodology or implementation. |
| Open Datasets | Yes | For instance, the generic multi-document summarization task aims to summarize a cluster of documents telling the same topic. In this task, the most widely-used datasets are published by Document Understanding Conferences2 (DUC) in 01, 02 and 04. (Footnote 2: http://duc.nist.gov/) and We release this dataset for further research1. (Footnote 1: http://www4.comp.polyu.edu.hk/ cszqcao/) |
| Dataset Splits | No | The paper states that DUC datasets are used as test sets and TGSum as an "extra training resource," but it does not provide specific percentages, sample counts, or explicit splitting methodology for training, validation, or test sets for its experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions "open Python package newspaper3" and "IBM CPLEX Optimizer5" but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions features used for the SVR summarizer (TF, LENGTH, STOP-RATIO) and states the summary length is set to 100 words. However, it does not provide specific hyperparameter values for the SVR model or other detailed training configurations (e.g., learning rates, batch sizes, specific optimizer settings). |