Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning
Authors: Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks, where our method largely outperforms all the relevant baselines we consider. |
| Researcher Affiliation | Collaboration | KAIST1, AITRICS2, South Korea {lsnfamily02, haebeom.lee , juholee, sjhwang82}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Sequential Reptile |
| Open Source Code | No | The paper does not contain an explicit statement or a link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For QA, we use Gold passage of TYDI-QA (Clark et al., 2020) dataset... For NER, we use Wiki Ann dataset (Pan et al., 2017)... For NLI, we use MNLI (Williams et al., 2018) dataset as a source training dataset and test the model on fourteen languages from XNLI (Conneau et al., 2018) as a target languages. |
| Dataset Splits | Yes | Table 6: The number of train/validation instances for each language from TYDI-QA dataset. Split ar bn en fi id ko ru sw te Total Train 14,805 ... Val. 1,034 ... |
| Hardware Specification | No | The paper mentions running experiments 'with a single GPU' and 'in parallel with 8 GPUs' but does not specify the exact GPU model, CPU type, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions using 'multilingual BERT', 'Adam W optimizer', and 'transformers library' but does not provide specific version numbers for these software components or other dependencies such as Python or PyTorch. |
| Experiment Setup | Yes | We fintune it with Adam W (Loshchilov & Hutter, 2019) optimizer, setting the inner-learning rate α to 3 10 5. We use batch size 12 for QA and 16 for NER, respectively. For our method, we set the outer learning rate η to 0.1 and the number inner-steps K to 1000. |