Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Authors: Shuaiqi LIU, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that our CAST method outperforms various advanced summarization methods.
Researcher Affiliation	Academia	Department of Computing, The Hong Kong Polytechnic University EMAIL
Pseudocode	No	The paper does not include a figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code or an algorithm.
Open Source Code	No	The paper provides a link for the dataset ('Our dataset: https://github.com/StevenLau6/BigSurvey'), but it does not state that the source code for the proposed CAST methodology or the experimental setup is made publicly available or provide a link to such code.
Open Datasets	Yes	Our dataset: https://github.com/StevenLau6/BigSurvey
Dataset Splits	Yes	We split the training (80%), validation (10%), and test (10%) sets.
Hardware Specification	Yes	All the models are trained on one NVIDIA RTX8000.
Software Dependencies	No	The paper mentions using 'Hugging Face s Transformers [Wolf et al., 2020]' and 'fairseq [Ott and others, 2019]' for implementations, but it does not provide specific version numbers for these software libraries or other ancillary software components.
Experiment Setup	Yes	The vocabulary s maximum size is set as 50,265 for these abstractive summarization models, while the BERT-based classifiers use 30,522 as default. We use dropout with the probability 0.1. The optimizer is Adam with β1=0.9 and β2=0.999. Summarization models use learning rate of 5e 5, while the classifiers use 2e 5. We also adopt the learning rate warmup and decay. During decoding, we use beam search with a beam size of 5. Trigram blocking is used to reduce repetitions.