Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Improving Faithfulness in Abstractive Summarization
Authors: Xiuying Chen, Mingzhe Li, Xin Gao, Xiangliang Zhang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two benchmark summarization datasets, CNN/DM and XSum, demonstrate that our model significantly outperforms strong baselines. The evaluation of factual consistency also shows that our model generates more faithful summaries than baselines. |
| Researcher Affiliation | Collaboration | Xiuying Chen1 Mingzhe Li2 Xin Gao1,3 Xiangliang Zhang4,1 1 Computational Bioscience Reseach Center, King Abdullah University of Science and Technology 2 Ant Group 3 SDAIA-KAUST AI 4 University of Notre Dame EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its model architecture and loss functions using mathematical equations and descriptive text, but it does not include a formal 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | 2https://github.com/iriscxy/FES and We include our code implementation in supplemental material. |
| Open Datasets | Yes | We validate the effectiveness of our FES model by conducting extensive experiments on public benchmark CNN/DM [12] and XSum [13] datasets. |
| Dataset Splits | Yes | For training the summarization model... For validation and test, we use the pairs selected by the extraction model. |
| Hardware Specification | Yes | We implement our experiments in Huggingface [42] on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like Huggingface, BART (facebook/bart-large), and PEGASUS (google/pegasus-xsum), but does not specify their version numbers. |
| Experiment Setup | Yes | The QA number is set to 8 unless otherwise stated. We use Adam optimizer with ϵ as 1e-8 and β as (0.9, 0.999). The learning rate is set to 3e-5. The warm-up is set to 500 steps for CNN/DM and 125 for XSum. The batch size is set to 8 with gradient accumulation steps of 4. The beam size is set to 6 for CNN/DM and 8 for XSum. |