reproducibilityindex.ai

Learning from Failure: Improving Meeting Summarization without Good Samples

Authors: Ke Wang, Xiutian Zhao, Wei Peng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment results show that our method is effective in improving meeting summarization on both English and Chinese corpora while requiring less annotated data and training resources compared to existing alignment methods. Additionally, we also preliminarily explore the transferability of our approach in machine translation tasks and demonstrate its potential for future development and usage in other domains. We conduct experiments to test the hypothesized effectiveness of our method in improving summarization on both an in-house dataset and two public corpora, and both automatic and human evaluation yield positive results. We benchmark our method against the in-house dataset as well as public corpora of English and Chinese.
Researcher Affiliation	Industry	Ke Wang, Xiutian Zhao, Wei Peng Huawei IT Innovation and Research Center {wangke215, zhaoxiutian, peng.wei1}@huawei.com
Pseudocode	No	The paper describes the method using textual steps and a high-level diagram (Figure 2), but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using pre-trained models under open-source licenses (e.g., "Our usage of the pre-trained model Chat GLM2 (Zeng et al. 2022) is under Apache-2.0 license"), but does not state that the code for their proposed Score Tuning method is open-source or provided.
Open Datasets	Yes	Additionally we augment VCSum dataset (Wu et al. 2023) as mentioned in Related Work. Specifically, we use the TL;DR dataset (Volske et al. 2017), which contains about 3 million posts from reddit.com across a variety of topics, as well as summaries of the posts written by the original poster. To investigate the potential transferability of our method to other NLP tasks, we also test an variation of Score Tuning on CWMT 2018 dataset (Bojar et al. 2018) for machine translation task.
Dataset Splits	No	The paper mentions using a "test set" for the VCSum dataset but does not provide specific details on how the training, validation, and test splits were performed (e.g., percentages or absolute counts) for any of the datasets used.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using an "ASR system (iFLYTEK)" and the "Chat GLM-6B" and "Ziya-LLa MA-7B-Reward" models, and the "R irr package", but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	No	The paper describes the three steps of Score Tuning and the loss functions used (Confidence-Aware Loss, Hindsight-Score Loss) but does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer details.