Effective Slot Filling via Weakly-Supervised Dual-Model Learning
Authors: Jue Wang, Ke Chen, Lidan Shou, Sai Wu, Gang Chen13952-13960
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate that our approach achieves better results than standard baselines on multiple datasets, especially in the low-resource setting. We evaluate the performance of our method on three different datasets, namely SNIPS (Coucke et al. 2018), ATIS (Hemphill, Godfrey, and Doddington 1990; Tur, Hakkani-T ur, and Heck 2010) and MIT Rest. (Liu et al. 2013). |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University 2State Key Laboratory of CAD&CG, Zhejiang University {zjuwangjue,chenk,should,wusai,cg}@zju.edu.cn |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams (Figure 1 and Figure 2) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/LorrinWWW/ weakly-supervised-slot-filling |
| Open Datasets | Yes | We evaluate the performance of our method on three different datasets, namely SNIPS (Coucke et al. 2018), ATIS (Hemphill, Godfrey, and Doddington 1990; Tur, Hakkani-T ur, and Heck 2010) and MIT Rest. (Liu et al. 2013). |
| Dataset Splits | Yes | We use the standard train-dev-test split for these datasets. For ATIS and MIT Rest., since they do not have a standard development set, we randomly pick 10% of the original training set as the development set. And for each run, we save the model checkpoint that achieves the highest F1 score on the dev set, and report its score on the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models, processors, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'GloVe word vectors', 'BERT (bert-large-uncased)', and 'Adam' but does not specify their version numbers or any other software dependencies with versions. |
| Experiment Setup | Yes | For each mini-batch, we sample 30 utterances from labeled data and from weakly-labeled data. GloVe word vectors (Pennington, Socher, and Manning 2014) are used to initialize word embeddings, which are tuned during training. We also use BERT (bert-large-uncased, fixed without fine-tuning) to produce contextualized embeddings concatenated after the word embeddings. We set the hidden size to 200, and since we use bidirectional LSTMs, the hidden size for each LSTM is 100. We also apply 0.3 dropout after embeddings and LSTMs to mitigate the over-fitting issue. We use Adam with a learning rate of 1e-3 to train the model. |