SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL

Authors: Ruichu Cai, Jinjie Yuan, Boyan Xu, Zhifeng Hao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to study the effectiveness of SADGA. Especially, SADGA outperforms the baseline methods and achieves 3rd place on the challenging Text-to-SQL benchmark Spider 2 [34] at the time of writing.
Researcher Affiliation Academia 1 School of Computer Science, Guangdong University of Technology, Guangzhou, China 2 Peng Cheng Laboratory, Shenzhen, China 3 College of Science, Shantou University, Shantou, China
Pseudocode No The paper describes methods in text and uses figures for illustration, but no explicit pseudocode or algorithm blocks are present.
Open Source Code Yes Our implementation will be open-sourced at https://github.com/DMIRLAB-Group/SADGA.
Open Datasets Yes In this section, we conduct experiments on the Spider dataset [34], the benchmark of cross-domain Text-to-SQL, to evaluate the effectiveness of our model.
Dataset Splits Yes The Spider has been so far the most challenging benchmark on cross-domain Text-to-SQL, which contains 9 traditional specific-domain datasets, such as ATIS [8], Geo Query [36], Wiki SQL [1], IMDB [30] etc. It is split into the train set (8659 examples), development set (1034 examples) and test set (2147 examples), which are respectively distributed across 146, 20 and 40 databases.
Hardware Specification Yes We trained our models on one server with a single NVIDIA GTX 3090 GPU.
Software Dependencies No The paper mentions optimizers (Adam) and models (BERT, GAP) but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes We follow the original hyperparameters of RATSQL [27] that uses batch size 20, initial learning rate 7 × 10^−4, max steps 40,000 and the Adam optimizer [16]. For BERT, the initial learning rate is adjusted to 2 × 10^−4, and the max training step is increased to 90,000. We also apply a separate learning rate of 3 × 10^−6 to fine-tune BERT. For GAP, we follow the original settings in Shi et al. [25]. In addition, we stack 3-layer SADGA followed by 4-layer RAT.