reproducibilityindex.ai

Semantic Parsing in Task-Oriented Dialog with Recursive Insertion-Based Encoder

Authors: Elman Mansimov, Yi Zhang11067-11075

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our proposed approach on lowresource and high-resource versions of the popular conversational semantic parsing dataset TOP (Gupta et al. 2018; Chen et al. 2020). We compare our model against a state-of-the-art transition-based parser RNNG (Gupta et al. 2018; Einolghozati et al. 2019) and seq2seq models (Rongali et al. 2020; Zhu et al. 2020; Aghajanyan et al. 2020; Babu et al. 2021) adapted to this task. We show that our approach achieves the state-of-the-art performance on both low-resource and high-resource settings TOP. In particular, RINE achieves up to an 13% absolute improvement in exact match in the low-resource setting.
Researcher Affiliation	Industry	Elman Mansimov and Yi Zhang AWS AI Labs {mansimov, yizhngn}@amazon.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Figure 1 provides a conceptual overview, but it is not pseudocode.
Open Source Code	No	The paper does not provide concrete access to its own source code (e.g., a specific repository link or an explicit code release statement). It mentions using the 'fairseq framework' but not their implementation's code.
Open Datasets	Yes	We use the TOP (Gupta et al. 2018) and TOPv2 (Chen et al. 2020) conversational semantic parsing datasets as well as ACE2005 nested named entity recognition dataset in our experiments. The TOP dataset1 (Gupta et al. 2018) consists of natural language utterances in two domains: navigation and event. 1http://fb.me/semanticparsingdialog
Dataset Splits	Yes	This results in 28,414 train, 4,032 valid and 8,241 test utterances. The reminder domain at 500 SPIS contains 4,788 train and 2,526 valid samples, weather 500 SPIS contains 2,372 train and 2,667 valid samples, reminder 25 SPIS contains 493 train and 337 valid samples, and weather 25 SPIS contains 176 train and 147 valid samples.
Hardware Specification	Yes	For all datasets we use 4 Tesla V100 GPUs to train both baseline and proposed model.
Software Dependencies	No	The paper mentions 'fairseq framework' and specific pre-trained models like RoBERTa but does not provide specific version numbers for these or other ancillary software components (e.g., 'fairseq 0.10' or 'Python 3.8').
Experiment Setup	Yes	We use the Adam optimizer (Kingma and Ba 2014) with the following hyperparameters: β1 = 0.9, β2 = 0.98, ϵ = 1e 6 and L2 weight decay of 1e 4. When using Ro BERTa BASE , we warm-up the learning rate for 500 steps up to a peak value of 5e 4 and then decay it based on the inverse square root of the update number. When using Ro BERTa LARGE , we warm-up the learning rate for 1,000 steps up to a peak value of 1e 5 and then decay it based on the inverse number of update steps. We use a dropout (Srivastava et al. 2014) rate of 0.3 and an attention dropout rate of 0.1 in both our proposed models and seq2seq baseline.