reproducibilityindex.ai

Improving Multi-Document Summarization via Text Classification

Authors: Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on DUC generic multidocument summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.
Researcher Affiliation	Collaboration	1Department of Computing, The Hong Kong Polytechnic University, Hong Kong 2Hong Kong Polytechnic University Shenzhen Research Institute, China 3Key Laboratory of Computational Linguistics, Peking University, MOE, China 4Microsoft Research, Beijing, China
Pseudocode	No	The paper describes the model architecture and processes using text and a diagram (Figure 1), but it does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The most commonly used evaluation corpora for summarization are the ones published by the Document Understanding Conferences (DUC) and Text Analytics Conferences (TAC2). ... The NYT corpus contains over 1.8 million articles published and annotated by the New York Times. ... 3https://catalog.ldc.upenn.edu/LDC2008T19
Dataset Splits	Yes	We conduct three-fold validation. The model is trained on two years data and tested on the remaining year s. ... The cross validation shows that the learned classiﬁcation model of TCSum achieves over 85% accuracy on this dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	Yes	For evaluation, we use ROUGE4 (Lin 2004), which has been regarded as a standard automatic evaluation metric since 2004. ROUGE-1.5.5 with options: -n 2 -m -u -c 95 -x -r 1000 -f A -p 0.5 -t 0. ... we apply the diagonal variant of Ada Grad with mini-batches (Duchi, Hazan, and Singer 2011) to update model parameters.
Experiment Setup	Yes	The dimension of word embeddings is set to 50, as in many previous papers (e.g., (Collobert et al. 2011)). We also set the dimension of sentence and document embeddings equivalent the dimension of word embeddings, and the window size h to 2, to be consistent with ROUGE-2 evaluation. We empirically set the margin threshold of pairwise ranking Ω = 0.1. The initial learning rate is 0.1 and batch size is 128.