Zero-Shot Text-to-SQL Learning with Auxiliary Task

Authors: Shuaichen Chang, Pengfei Liu, Yun Tang, Jing Huang, Xiaodong He, Bowen Zhou7488-7495

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, We evaluate our models on a large text-to-SQL dataset Wiki SQL. Compared to a strong baseline coarse-tofine model, our models improve over the baseline by more than 3% absolute in accuracy on the whole dataset. More interestingly, on a zero-shot subset test of Wiki SQL, our models achieve 5% absolute accuracy gain over the baseline, clearly demonstrating its superior generalizability.
Researcher Affiliation Collaboration Shuaichen Chang,1 Pengfei Liu,2 Yun Tang,3 Jing Huang,3 Xiaodong He,3 Bowen Zhou3 1The Ohio State University, 2Fudan University, 3JD.COM AI Research chang.1692@osu.edu, pfliu14@fudan.edu.cn, {yun.tang, jing.huang, xiaodong.he, bowen.zhou}@jd.com
Pseudocode No No explicit pseudocode or algorithm blocks are provided. The paper includes mathematical formulations for CLS and PT functions, and a 'SQL Sketch' figure, but not structured pseudocode.
Open Source Code Yes 1Our code can be found in https://github.com/JD-AI-Research Silicon-Valley/auxiliary-task-for-text-to-sql
Open Datasets Yes Wiki SQL has over 20K tables and 80K questions corresponding to these tables. This dataset was designed for translating natural language questions to SQL queries using the corresponding table columns without access to the table content.
Dataset Splits Yes We split the test set based on the number of shots (the number of a table occurrences in training data).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided in the paper.
Software Dependencies No The paper mentions using "300-dim Glove word embedding" and "Bi LSTM sentence encoder" but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We use 300-dim Glove word embedding as our pre-trained embedding. Hidden size for all LSTM is 250 and hidden size in attention function is set to 64. The loss weight λ is set to 0.5. A 0.5-rate dropout layer is used before each output layer. Each concatenation is followed by one full-connected layer to reduce the dimension to the original hidden or attention size. Test model is selected by the best performing model on validation set.