Zero-Shot Adaptive Transfer for Conversational Language Understanding
Authors: Sungjin Lee, Rahul Jha6642-6649
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime. |
| Researcher Affiliation | Industry | Sungjin Lee, Rahul Jha Microsoft Corporation, Redmond, WA {sule,rajh}@microsoft.com |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | For our experiments, we collected data from a set of ten diverse domains. Table 1 shows the domains along with some statistics and sample utterances. Since these are new domains for our digital assistant, we did not have enough data for these domains in our historical logs. Therefore, the data was collected using crowdsourcing from human judges. |
| Dataset Splits | Yes | For each of the domains, we sampled 80% of the data as training and 10% each as dev and test sets. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using GloVe embeddings and the spaCy library (for POS tagging), but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We initialized all LSTMs using the Xavier uniform distribution (Glorot and Bengio 2010). We use the Adam optimizer (Kingma and Ba 2015a), with gradients computed on mini-batches of size 32 and clipped with norm value 5. The learning rate was set to 1 10 3 throughout the training and all the other hyperparameters were left as suggested in (Kingma and Ba 2015a). We performed early stopping based on the performance of the evaluation data to avoid overfitting. |