MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
Authors: Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. |
| Researcher Affiliation | Collaboration | Longxu Dou1, Yan Gao2, Mingyang Pan1, Dingzirui Wang1, Wanxiang Che1, Dechen Zhan1, Jian-Guang Lou2 1 Harbin Institute of Technology 2 Microsoft Research Asia |
| Pseudocode | No | The paper describes methods in prose and flowcharts (Figure 4, 5) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/microsoft/Contextual SP |
| Open Datasets | Yes | We build MULTISPIDER based on Spider (Yu et al. 2018), a large-scale cross-database text-to-SQL dataset in English. We also collect data from the CSpider (Min and Zhang 2019) and VSpider (Tuan Nguyen, Dao, and Nguyen 2020), which are also free and open text-to SQL dataset. |
| Dataset Splits | Yes | Only 9691 questions and 5263 SQL queries over 166 databases (train-set and dev-set) are publicly available. |
| Hardware Specification | No | The paper does not explicitly provide details about the specific hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions specific models and frameworks (e.g., m BERT, XLM-Roberta-Large, m BART, RAT-SQL) with citations but does not provide specific version numbers for software dependencies or libraries used in their implementation. |
| Experiment Setup | Yes | Training with Augmented Data During the training phase, we first adopt the augmented data to warm up the model three epochs to alleviate the noise in augmented data, then fine-tune the model with original high-quality training data. |