Global-to-local Memory Pointer Networks for Task-Oriented Dialogue
Authors: Chien-Sheng Wu, Richard Socher, Caiming Xiong
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our model can improve copy accuracy and mitigate the common out-of-vocabulary problem. As a result, GLMP is able to improve over the previous state-of-the-art models in both simulated b Ab I Dialogue dataset and human-human Stanford Multi-domain Dialogue dataset on automatic and human evaluation. |
| Researcher Affiliation | Collaboration | Salesforce Research {rsocher,cxiong}@salesforce.com The Hong Kong University of Science and Technology jason.wu@connect.ust.hk |
| Pseudocode | No | The paper describes the model architecture and components in detail through text and diagrams but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our model 1 is composed of three parts: global memory encoder, external knowledge, and local memory decoder, as shown in Figure 1(a). 1https://github.com/salesforce |
| Open Datasets | Yes | We use two public multi-turn task-oriented dialogue datasets to evaluate our model: the b Ab I dialogue (Bordes & Weston, 2017) and Stanford multi-domain dialogue (SMD) (Eric et al., 2017). |
| Dataset Splits | Yes | Table 6: Dataset statistics for 2 datasets. ... Train dialogues ... Val dialogues ... Test dialogues ... The hyper-parameters such as hidden size and dropout rate are tuned with grid-search over the development set (per-response accuracy for b Ab I Dialogue and BLEU score for the SMD). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The model is implemented in PyTorch and the paper mentions Adam optimizer but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The model is trained end-to-end using Adam optimizer (Kingma & Ba, 2015), and learning rate annealing starts from 1e 3 to 1e 4. The number of hop K is set to 1,3,6 to compare the performance difference. The weights α, β, γ summing up the three losses are set to 1. All the embeddings are initialized randomly, and a simple greedy strategy is used without beam-search during the decoding stage. The hyper-parameters such as hidden size and dropout rate are tuned with grid-search over the development set (per-response accuracy for b Ab I Dialogue and BLEU score for the SMD). ... Table 5: Selected hyper-parameters in each dataset for different hops. The values is the embedding dimension and the GRU hidden size, and the values between parenthesis is the dropout rate. For all the models we used learning rate equal to 0.001, with a decay rate of 0.5. |