Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL
Authors: Ruichu Cai, Jinjie Yuan, Boyan Xu, Zhifeng Hao
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to study the effectiveness of SADGA. Especially, SADGA outperforms the baseline methods and achieves 3rd place on the challenging Text-to-SQL benchmark Spider 2 [34] at the time of writing. |
| Researcher Affiliation | Academia | 1 School of Computer Science, Guangdong University of Technology, Guangzhou, China 2 Peng Cheng Laboratory, Shenzhen, China 3 College of Science, Shantou University, Shantou, China |
| Pseudocode | No | The paper describes methods in text and uses figures for illustration, but no explicit pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | Our implementation will be open-sourced at https://github.com/DMIRLAB-Group/SADGA. |
| Open Datasets | Yes | In this section, we conduct experiments on the Spider dataset [34], the benchmark of cross-domain Text-to-SQL, to evaluate the effectiveness of our model. |
| Dataset Splits | Yes | The Spider has been so far the most challenging benchmark on cross-domain Text-to-SQL, which contains 9 traditional speciο¬c-domain datasets, such as ATIS [8], Geo Query [36], Wiki SQL [1], IMDB [30] etc. It is split into the train set (8659 examples), development set (1034 examples) and test set (2147 examples), which are respectively distributed across 146, 20 and 40 databases. |
| Hardware Specification | Yes | We trained our models on one server with a single NVIDIA GTX 3090 GPU. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and models (BERT, GAP) but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | We follow the original hyperparameters of RATSQL [27] that uses batch size 20, initial learning rate 7 Γ 10^β4, max steps 40,000 and the Adam optimizer [16]. For BERT, the initial learning rate is adjusted to 2 Γ 10^β4, and the max training step is increased to 90,000. We also apply a separate learning rate of 3 Γ 10^β6 to ο¬ne-tune BERT. For GAP, we follow the original settings in Shi et al. [25]. In addition, we stack 3-layer SADGA followed by 4-layer RAT. |