BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles

Authors: Yunxiang Zhang, Xiaojun Wan11748-11756

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments with multiple pretraining models on Bi Rd QA under monolingual, cross-lingual and multilingual settings (Jing, Xiong, and Yan 2019). Monolingual. We use data in the same language for training and evaluating models (i.e., en en, zh zh). Cross-lingual. We test performance in zero-shot cross-lingual transfer learning, where a multilingual pretrained model is fine-tuned on one source language and evaluated on a different target language (i.e., en zh, zh en). Multilingual. We directly mix training instances of the two languages into a single training set and build a single QA model to handle bilingual riddles in Bi Rd QA (i.e., en+zh en, en+zh zh).
Researcher Affiliation Academia Yunxiang Zhang, Xiaojun Wan Wangxuan Institute of Computer Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {yx.zhang,wanxiaojun}@pku.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states, 'The dataset is publicly available at https://forms.gle/Nv T7Df Wh APhvo Fv H7.', which is a link to the dataset, not the source code for the methodology.
Open Datasets Yes We introduce Bi Rd QA, a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles. ... The dataset is publicly available at https://forms.gle/Nv T7Df Wh APhvo Fv H7.
Dataset Splits Yes Table 1 describes the key statistics of Bi Rd QA. ... # Training examples 4093 5943 # Validation examples 1061 1042 # Test examples 1460 1766 # Total examples 6614 8751
Hardware Specification No The paper states, 'Due to limitation of computational resource, we restrict the input length to 256 tokens for all models except 150 for Unified QA,' but provides no specific details about the hardware used (e.g., GPU models, CPU types).
Software Dependencies No The paper mentions using 'Huggingface implementations for all the baseline models' and 'jieba toolkit' for Chinese word segmentation, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup No The paper states that 'All hyper-parameters are decided by the model performance on the development set' and mentions model selection constraints, but it does not provide specific hyperparameter values or detailed training configurations.