Meta-Transfer Learning for Low-Resource Abstractive Summarization
Authors: Yi-Syuan Chen, Hong-Han Shuai12692-12700
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on various summarization corpora with different writing styles and forms. The results demonstrate that our approach achieves the state-of-the-art on 6 corpora in low-resource scenarios, with only 0.7% of trainable parameters compared to previous work. |
| Researcher Affiliation | Academia | Yi-Syuan Chen, Hong-Han Shuai National Chiao Tung University, Taiwan yschen.eed09g@nctu.edu.tw, hhshuai@nctu.edu.tw |
| Pseudocode | No | The paper includes diagrams and descriptive text for its methods but does not provide any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | 1Code is available at https://github.com/Yi Syuan Chen/MTLABS |
| Open Datasets | Yes | CNN/Daily Mail (Hermann et al. 2015)... AESLC (Zhang and Tetreault 2019)... Bill Sum (Kornilova and Eidelman 2019)... Gigaword (Rush, Chopra, and Weston 2015)... Multi-News (Fabbri et al. 2019)... NEWSROOM (Grusky, Naaman, and Artzi 2018)... Webis-TLDR-17 (V olske et al. 2017)... Reddit-TIFU (Kim, Kim, and Kim 2019)... ar Xiv, Pub Med (Cohan et al. 2018)... Wiki How (Koupaee and Wang 2018) |
| Dataset Splits | Yes | For adaptation, we finetune the meta-learned model with 10 or 100 labeled examples on the target corpus. ... For meta-validation, we use a corpus excluded from source tasks and target task, and the performance is calculated as an average of 600 batches. |
| Hardware Specification | Yes | All of our experiments are conducted on a single NVIDIA Tesla V100 32GB GPU with Py Torch. |
| Software Dependencies | No | The paper mentions 'Py Torch' as a software used but does not provide a specific version number. It also mentions 'Adam (Kingma and Ba 2015) optimizer', which is an algorithm, not a software dependency with a version. |
| Experiment Setup | Yes | The self-attention layer we used has 768 hidden neurons with 8 heads, and the feed-forward layer contains 3072 hidden neurons. The encoder consists of 12 transformer layers with a dropout rate of 0.1, and the decoder has 6 transformer layers with a dropout rate of 0.2. For adapter modules, the hidden size is 64. The vocabulary size is set to 30K. For meta-training, unless otherwise specified, a meta-batch includes 3 tasks, and the batch size for each task is 4. The base-learner and meta-learner are both optimized with Adam (Kingma and Ba 2015) optimizer, and the learning rate is set to 0.0002. The inner gradient step is 4, and the whole model is trained with 6K meta-steps. |