Learning End-to-End Goal-Oriented Dialog
Authors: Antoine Bordes, Y-Lan Boureau, Jason Weston
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper proposes a testbed to break down the strengths and shortcomings of end-to-end dialog systems in goal-oriented applications. Set in the context of restaurant reservation, our tasks require manipulating sentences and symbols in order to properly conduct conversations, issue API calls and use the outputs of such calls. We show that an end-to-end dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform non-trivial operations. We confirm those results by comparing our system to a hand-crafted slot-filling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a). This is supported by the comprehensive 'Table 2: Test results across all tasks and methods.' |
| Researcher Affiliation | Industry | Antoine Bordes, Y-Lan Boureau & Jason Weston Facebook AI Research New York, USA {abordes, ylan, jase}@fb.com |
| Pseudocode | No | The paper describes the Memory Networks implementation in Appendix A with mathematical formulations and textual explanations of the steps involved, but it does not include a block explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper states, 'The data for all tasks is available at http://fb.ai/babi,' which refers to the dataset. However, it does not provide an explicit statement or link for the open-source code of the methodology described in the paper (e.g., their Memory Network implementation). |
| Open Datasets | Yes | All our tasks involve a restaurant reservation system, where the goal is to book a table at a restaurant. The first five tasks are generated by a simulation, the last one uses real human-bot dialogs. The data for all tasks is available at http://fb.ai/babi. Table 1 also lists 'Training dialogs' with specific counts. |
| Dataset Splits | Yes | Table 1: Data used in this paper. Training dialogs, Validation dialogs, Test dialogs are listed with specific counts for each task, e.g., for Tasks 1-5, Training dialogs: 1,000, Validation dialogs: 1,000, Test dialogs: 1,000. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts used for running its experiments. It does not mention any cloud or cluster resources with specifications. |
| Software Dependencies | No | The paper discusses various models and training procedures (e.g., 'trained using stochastic gradient descent (SGD)') and mentions frameworks indirectly by citing papers (e.g., 'Mem N2N architecture of Sukhbaatar et al. (2015)'). However, it does not provide specific software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., TensorFlow, PyTorch), or CUDA versions. |
| Experiment Setup | Yes | Appendix C, titled 'HYPERPARAMETERS', explicitly displays the values of the hyperparameters of the best Supervised Embeddings (Table 8) and Memory Networks (Table 9) selected for each task. These include 'Learning Rate', 'Margin m', 'Embedding Dim d', 'Negative Cand. N', and 'Nb Hops'. |