Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Authors: Shuaiqi LIU, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that our CAST method outperforms various advanced summarization methods.
Researcher Affiliation Academia Department of Computing, The Hong Kong Polytechnic University {cssqliu, csjcao, csryang, cszwen}@comp.polyu.edu.hk
Pseudocode No The paper does not include a figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code or an algorithm.
Open Source Code No The paper provides a link for the dataset ('Our dataset: https://github.com/StevenLau6/BigSurvey'), but it does not state that the source code for the proposed CAST methodology or the experimental setup is made publicly available or provide a link to such code.
Open Datasets Yes Our dataset: https://github.com/StevenLau6/BigSurvey
Dataset Splits Yes We split the training (80%), validation (10%), and test (10%) sets.
Hardware Specification Yes All the models are trained on one NVIDIA RTX8000.
Software Dependencies No The paper mentions using 'Hugging Face s Transformers [Wolf et al., 2020]' and 'fairseq [Ott and others, 2019]' for implementations, but it does not provide specific version numbers for these software libraries or other ancillary software components.
Experiment Setup Yes The vocabulary s maximum size is set as 50,265 for these abstractive summarization models, while the BERT-based classifiers use 30,522 as default. We use dropout with the probability 0.1. The optimizer is Adam with β1=0.9 and β2=0.999. Summarization models use learning rate of 5e 5, while the classifiers use 2e 5. We also adopt the learning rate warmup and decay. During decoding, we use beam search with a beam size of 5. Trigram blocking is used to reduce repetitions.