Meta-Transfer Learning for Low-Resource Abstractive Summarization

Authors: Yi-Syuan Chen, Hong-Han Shuai12692-12700

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on various summarization corpora with different writing styles and forms. The results demonstrate that our approach achieves the state-of-the-art on 6 corpora in low-resource scenarios, with only 0.7% of trainable parameters compared to previous work.
Researcher Affiliation Academia Yi-Syuan Chen, Hong-Han Shuai National Chiao Tung University, Taiwan yschen.eed09g@nctu.edu.tw, hhshuai@nctu.edu.tw
Pseudocode No The paper includes diagrams and descriptive text for its methods but does not provide any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes 1Code is available at https://github.com/Yi Syuan Chen/MTLABS
Open Datasets Yes CNN/Daily Mail (Hermann et al. 2015)... AESLC (Zhang and Tetreault 2019)... Bill Sum (Kornilova and Eidelman 2019)... Gigaword (Rush, Chopra, and Weston 2015)... Multi-News (Fabbri et al. 2019)... NEWSROOM (Grusky, Naaman, and Artzi 2018)... Webis-TLDR-17 (V olske et al. 2017)... Reddit-TIFU (Kim, Kim, and Kim 2019)... ar Xiv, Pub Med (Cohan et al. 2018)... Wiki How (Koupaee and Wang 2018)
Dataset Splits Yes For adaptation, we finetune the meta-learned model with 10 or 100 labeled examples on the target corpus. ... For meta-validation, we use a corpus excluded from source tasks and target task, and the performance is calculated as an average of 600 batches.
Hardware Specification Yes All of our experiments are conducted on a single NVIDIA Tesla V100 32GB GPU with Py Torch.
Software Dependencies No The paper mentions 'Py Torch' as a software used but does not provide a specific version number. It also mentions 'Adam (Kingma and Ba 2015) optimizer', which is an algorithm, not a software dependency with a version.
Experiment Setup Yes The self-attention layer we used has 768 hidden neurons with 8 heads, and the feed-forward layer contains 3072 hidden neurons. The encoder consists of 12 transformer layers with a dropout rate of 0.1, and the decoder has 6 transformer layers with a dropout rate of 0.2. For adapter modules, the hidden size is 64. The vocabulary size is set to 30K. For meta-training, unless otherwise specified, a meta-batch includes 3 tasks, and the batch size for each task is 4. The base-learner and meta-learner are both optimized with Adam (Kingma and Ba 2015) optimizer, and the learning rate is set to 0.0002. The inner gradient step is 4, and the whole model is trained with 6K meta-steps.