CAR-Transformer: Cross-Attention Reinforcement Transformer for Cross-Lingual Summarization

Authors: Yuang Cai, Yuyu Yuan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach demonstrates more consistent improvement across CLS tasks compared to traditional multi-task training methods and outperforms the fine-tuned vanilla m BART by 3.67 and the best-performing multi-task training approach by 1.48 in ROUGE-L F1 score on the Wiki Lingua Korean-to-English CLS task.
Researcher Affiliation Academia Yuang Cai, Yuyu Yuan* Beijing University of Posts and Telecommunications Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education {cyang,yuanyuyu}@bupt.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides mathematical equations and a model architecture diagram, but no step-by-step pseudo-code.
Open Source Code Yes The training and evaluation codes are implemented based on Hugging Face Transformers 2. For the detailed experiment setting and implementation, please refer to the source code in supplementary files.
Open Datasets Yes We use Wiki Lingua (Ladhak et al. 2020), Global Voice (Nguyen and Daum e III 2019), and Cross Sum (Bhattacharjee et al. 2021) for training and evaluation.
Dataset Splits No The paper mentions evaluating models 'using the validation set' and plotting 'Validation ROUGE-L scores', but it does not provide specific details on the dataset split ratio or number of samples allocated for the validation set, which is required for reproducibility.
Hardware Specification Yes The training and evaluation procedures for each task are performed on a single NVIDIA A40 GPU.
Software Dependencies No The paper states: 'The training and evaluation codes are implemented based on Hugging Face Transformers'. While it mentions a specific library, it does not provide a version number for Hugging Face Transformers or any other software dependencies, which is necessary for reproducibility.
Experiment Setup Yes We truncate the source document to 512 tokens as input for the encoder, while the ground-truth summary in the target language, serving as input for the decoder, is truncated to 128 tokens. Similarly, the supervision signal for the CAR module, which comprises the ground-truth summary in the source language, is also truncated to 128 tokens. We fine-tune our approach and all baseline approaches on each CLS task for a total of 30 epochs utilizing the training set. With a training batch size of 8, we employ a gradient accumulation step of 2.