reproducibilityindex.ai

Global-to-local Memory Pointer Networks for Task-Oriented Dialogue

Authors: Chien-Sheng Wu, Richard Socher, Caiming Xiong

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that our model can improve copy accuracy and mitigate the common out-of-vocabulary problem. As a result, GLMP is able to improve over the previous state-of-the-art models in both simulated b Ab I Dialogue dataset and human-human Stanford Multi-domain Dialogue dataset on automatic and human evaluation.
Researcher Affiliation	Collaboration	Salesforce Research {rsocher,cxiong}@salesforce.com The Hong Kong University of Science and Technology jason.wu@connect.ust.hk
Pseudocode	No	The paper describes the model architecture and components in detail through text and diagrams but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	Our model 1 is composed of three parts: global memory encoder, external knowledge, and local memory decoder, as shown in Figure 1(a). 1https://github.com/salesforce
Open Datasets	Yes	We use two public multi-turn task-oriented dialogue datasets to evaluate our model: the b Ab I dialogue (Bordes & Weston, 2017) and Stanford multi-domain dialogue (SMD) (Eric et al., 2017).
Dataset Splits	Yes	Table 6: Dataset statistics for 2 datasets. ... Train dialogues ... Val dialogues ... Test dialogues ... The hyper-parameters such as hidden size and dropout rate are tuned with grid-search over the development set (per-response accuracy for b Ab I Dialogue and BLEU score for the SMD).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The model is implemented in PyTorch and the paper mentions Adam optimizer but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	The model is trained end-to-end using Adam optimizer (Kingma & Ba, 2015), and learning rate annealing starts from 1e 3 to 1e 4. The number of hop K is set to 1,3,6 to compare the performance difference. The weights α, β, γ summing up the three losses are set to 1. All the embeddings are initialized randomly, and a simple greedy strategy is used without beam-search during the decoding stage. The hyper-parameters such as hidden size and dropout rate are tuned with grid-search over the development set (per-response accuracy for b Ab I Dialogue and BLEU score for the SMD). ... Table 5: Selected hyper-parameters in each dataset for different hops. The values is the embedding dimension and the GRU hidden size, and the values between parenthesis is the dropout rate. For all the models we used learning rate equal to 0.001, with a decay rate of 0.5.