Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multilingual Transfer Learning for QA using Translation as Data Augmentation
Authors: Mihaela Bornea, Lin Pan, Sara Rosenthal, Radu Florian, Avirup Sil12583-12591
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TYDI QA datasets. |
| Researcher Affiliation | Industry | Mihaela Bornea, Lin Pan, Sara Rosenthal, Radu Florian, Avirup Sil IBM Research AI, Thomas J. Watson Research Center, Yorktown Heights, NY 10598 EMAIL |
| Pseudocode | Yes | Algorithm 1 Pseudo-code for adversarial training on the multilingual QA task. ... Algorithm 2 Pseudo-code for our language arbitration framework for the multilingual QA task. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the source code of the described methodology. It mentions using 'IBM Watson Language Translator' but this is a service, not their code release. |
| Open Datasets | Yes | We train our models on the SQu AD v1.1 dataset (details in Table 1). ... TYDI QA: ... train our models on SQu AD v1.1... |
| Dataset Splits | Yes | We perform hyper-parameter selection on the SQu AD and MLQA dev split. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'MBERTQA' and 'MBERT' but does not provide specific version numbers for any underlying software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use 3 10 5 as the learning rate, 384 as maximum sequence length, and a doc stride of 128. Everything except ZS was trained for 1 epoch. ... The discriminator is implemented as a multilayer perceptron with 2 hidden layers and a hidden size of 768 4. |