reproducibilityindex.ai

Zero-Shot Adaptive Transfer for Conversational Language Understanding

Authors: Sungjin Lee, Rahul Jha6642-6649

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime.
Researcher Affiliation	Industry	Sungjin Lee, Rahul Jha Microsoft Corporation, Redmond, WA {sule,rajh}@microsoft.com
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	No	For our experiments, we collected data from a set of ten diverse domains. Table 1 shows the domains along with some statistics and sample utterances. Since these are new domains for our digital assistant, we did not have enough data for these domains in our historical logs. Therefore, the data was collected using crowdsourcing from human judges.
Dataset Splits	Yes	For each of the domains, we sampled 80% of the data as training and 10% each as dev and test sets.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions using GloVe embeddings and the spaCy library (for POS tagging), but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We initialized all LSTMs using the Xavier uniform distribution (Glorot and Bengio 2010). We use the Adam optimizer (Kingma and Ba 2015a), with gradients computed on mini-batches of size 32 and clipped with norm value 5. The learning rate was set to 1 10 3 throughout the training and all the other hyperparameters were left as suggested in (Kingma and Ba 2015a). We performed early stopping based on the performance of the evaluation data to avoid overﬁtting.