Effective Slot Filling via Weakly-Supervised Dual-Model Learning

Authors: Jue Wang, Ke Chen, Lidan Shou, Sai Wu, Gang Chen13952-13960

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results demonstrate that our approach achieves better results than standard baselines on multiple datasets, especially in the low-resource setting. We evaluate the performance of our method on three different datasets, namely SNIPS (Coucke et al. 2018), ATIS (Hemphill, Godfrey, and Doddington 1990; Tur, Hakkani-T ur, and Heck 2010) and MIT Rest. (Liu et al. 2013).
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University 2State Key Laboratory of CAD&CG, Zhejiang University {zjuwangjue,chenk,should,wusai,cg}@zju.edu.cn
Pseudocode No The paper describes the model architecture and training process in text and diagrams (Figure 1 and Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/LorrinWWW/ weakly-supervised-slot-filling
Open Datasets Yes We evaluate the performance of our method on three different datasets, namely SNIPS (Coucke et al. 2018), ATIS (Hemphill, Godfrey, and Doddington 1990; Tur, Hakkani-T ur, and Heck 2010) and MIT Rest. (Liu et al. 2013).
Dataset Splits Yes We use the standard train-dev-test split for these datasets. For ATIS and MIT Rest., since they do not have a standard development set, we randomly pick 10% of the original training set as the development set. And for each run, we save the model checkpoint that achieves the highest F1 score on the dev set, and report its score on the test set.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models, processors, or memory) used for running its experiments.
Software Dependencies No The paper mentions software components like 'GloVe word vectors', 'BERT (bert-large-uncased)', and 'Adam' but does not specify their version numbers or any other software dependencies with versions.
Experiment Setup Yes For each mini-batch, we sample 30 utterances from labeled data and from weakly-labeled data. GloVe word vectors (Pennington, Socher, and Manning 2014) are used to initialize word embeddings, which are tuned during training. We also use BERT (bert-large-uncased, fixed without fine-tuning) to produce contextualized embeddings concatenated after the word embeddings. We set the hidden size to 200, and since we use bidirectional LSTMs, the hidden size for each LSTM is 100. We also apply 0.3 dropout after embeddings and LSTMs to mitigate the over-fitting issue. We use Adam with a learning rate of 1e-3 to train the model.