reproducibilityindex.ai

Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing

Authors: Run-Ze Wang, Zhen-Hua Ling, Jingbo Zhou, Yu Hu13979-13987

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the challenging Co SQL dataset demonstrate the effectiveness of our proposed method, which achieves better performance than other published methods on the task leaderboard.We evaluate our model on the Co SQL dataset, which is the largest and the most difﬁcult dataset for conversational and multi-turn text-to-SQL semantic parsing. Experimental results show that our model improves the question-matching accuracy (QM) of the previous best model (Zhang et al. 2019) from 40.8% to 41.8% and the interaction matching accuracy (IM) from 13.7% to 15.2%, respectively.
Researcher Affiliation	Collaboration	Run-Ze Wang1, Zhen-Hua Ling 1, Jing-Bo Zhou2, Yu Hu1,3 1National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China 2Business Intelligence Lab, Baidu Research 3i FLYTEK Research
Pseudocode	No	The paper describes its methods textually and with mathematical formulas but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code is published to help replicate our results5. https://github.com/runzewang/IST-SQL
Open Datasets	Yes	We evaluated our model on Co SQL (Yu et al. 2019a), which is a large-scale conversational and multi-turn text-to-SQL semantic parsing dataset. An interaction example in Co SQL is shown in Figure 1. Co SQL consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of Oz(WOZ) collection of 3k conversations querying 200 complex databases spanning 138 domain.
Dataset Splits	Yes	Co SQL...It has been separated into 2164 interactions with 140 databases for training, 292 interactions with 20 databases for development and 551 interactions with 40 databases for testing. SPar C consists of 3034 interactions with 140 databases for training, 422 interactions with 20 databases for development and 841 interactions with 40 databases for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The model was implemented using Py Torch3. The BERT embedder was initialized with a pre-trained small uncased BERT model4. (Footnotes 3 and 4 link to general project pages, not specific version numbers of the libraries used in the implementation.)
Experiment Setup	Yes	All hidden states in our proposed IST-SQL model had 300 dimensions except the BERT embedder with 768 hidden dimensions. The head number K was set as 3 heuristically. When training model parameters, in addition to the average token level cross-entropy loss for all the SQL tokens in an interaction, regularization terms were added to encourage the diversity of the multi-head attentions used in the utterance encoder and schema linking modules. The model was implemented using Py Torch3. We used ADAM optimizer (Kingma and Ba 2014) to minimize the loss function. The BERT embedder was initialized with a pre-trained small uncased BERT model4. All the other parameters were randomly initialized from a uniform distribution between [-0.1, 0.1]. The BERT embedder was ﬁne-tuned with learning rate of 1e 5 while the other parameters was trained with learning rate of 1e 3. An early stop mechanism was used with patient number 10 on the development set. The best model was selected based on the token-level string matching accuracy on the development set.