Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Authors: Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks, where our method largely outperforms all the relevant baselines we consider.
Researcher Affiliation Collaboration KAIST1, AITRICS2, South Korea {lsnfamily02, haebeom.lee , juholee, sjhwang82}@kaist.ac.kr
Pseudocode Yes Algorithm 1 Sequential Reptile
Open Source Code No The paper does not contain an explicit statement or a link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes For QA, we use Gold passage of TYDI-QA (Clark et al., 2020) dataset... For NER, we use Wiki Ann dataset (Pan et al., 2017)... For NLI, we use MNLI (Williams et al., 2018) dataset as a source training dataset and test the model on fourteen languages from XNLI (Conneau et al., 2018) as a target languages.
Dataset Splits Yes Table 6: The number of train/validation instances for each language from TYDI-QA dataset. Split ar bn en fi id ko ru sw te Total Train 14,805 ... Val. 1,034 ...
Hardware Specification No The paper mentions running experiments 'with a single GPU' and 'in parallel with 8 GPUs' but does not specify the exact GPU model, CPU type, or other detailed hardware specifications.
Software Dependencies No The paper mentions using 'multilingual BERT', 'Adam W optimizer', and 'transformers library' but does not provide specific version numbers for these software components or other dependencies such as Python or PyTorch.
Experiment Setup Yes We fintune it with Adam W (Loshchilov & Hutter, 2019) optimizer, setting the inner-learning rate α to 3 10 5. We use batch size 12 for QA and 16 for NER, respectively. For our method, we set the outer learning rate η to 0.1 and the number inner-steps K to 1000.