MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Authors: Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language.
Researcher Affiliation Collaboration Longxu Dou1, Yan Gao2, Mingyang Pan1, Dingzirui Wang1, Wanxiang Che1, Dechen Zhan1, Jian-Guang Lou2 1 Harbin Institute of Technology 2 Microsoft Research Asia
Pseudocode No The paper describes methods in prose and flowcharts (Figure 4, 5) but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/microsoft/Contextual SP
Open Datasets Yes We build MULTISPIDER based on Spider (Yu et al. 2018), a large-scale cross-database text-to-SQL dataset in English. We also collect data from the CSpider (Min and Zhang 2019) and VSpider (Tuan Nguyen, Dao, and Nguyen 2020), which are also free and open text-to SQL dataset.
Dataset Splits Yes Only 9691 questions and 5263 SQL queries over 166 databases (train-set and dev-set) are publicly available.
Hardware Specification No The paper does not explicitly provide details about the specific hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions specific models and frameworks (e.g., m BERT, XLM-Roberta-Large, m BART, RAT-SQL) with citations but does not provide specific version numbers for software dependencies or libraries used in their implementation.
Experiment Setup Yes Training with Augmented Data During the training phase, we first adopt the augmented data to warm up the model three epochs to alleviate the noise in augmented data, then fine-tune the model with original high-quality training data.