Zero-Shot Adaptive Transfer for Conversational Language Understanding

Authors: Sungjin Lee, Rahul Jha6642-6649

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime.
Researcher Affiliation Industry Sungjin Lee, Rahul Jha Microsoft Corporation, Redmond, WA {sule,rajh}@microsoft.com
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No For our experiments, we collected data from a set of ten diverse domains. Table 1 shows the domains along with some statistics and sample utterances. Since these are new domains for our digital assistant, we did not have enough data for these domains in our historical logs. Therefore, the data was collected using crowdsourcing from human judges.
Dataset Splits Yes For each of the domains, we sampled 80% of the data as training and 10% each as dev and test sets.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions using GloVe embeddings and the spaCy library (for POS tagging), but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We initialized all LSTMs using the Xavier uniform distribution (Glorot and Bengio 2010). We use the Adam optimizer (Kingma and Ba 2015a), with gradients computed on mini-batches of size 32 and clipped with norm value 5. The learning rate was set to 1 10 3 throughout the training and all the other hyperparameters were left as suggested in (Kingma and Ba 2015a). We performed early stopping based on the performance of the evaluation data to avoid overfitting.