Learning End-to-End Goal-Oriented Dialog

Authors: Antoine Bordes, Y-Lan Boureau, Jason Weston

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper proposes a testbed to break down the strengths and shortcomings of end-to-end dialog systems in goal-oriented applications. Set in the context of restaurant reservation, our tasks require manipulating sentences and symbols in order to properly conduct conversations, issue API calls and use the outputs of such calls. We show that an end-to-end dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform non-trivial operations. We confirm those results by comparing our system to a hand-crafted slot-filling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a). This is supported by the comprehensive 'Table 2: Test results across all tasks and methods.'
Researcher Affiliation Industry Antoine Bordes, Y-Lan Boureau & Jason Weston Facebook AI Research New York, USA {abordes, ylan, jase}@fb.com
Pseudocode No The paper describes the Memory Networks implementation in Appendix A with mathematical formulations and textual explanations of the steps involved, but it does not include a block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper states, 'The data for all tasks is available at http://fb.ai/babi,' which refers to the dataset. However, it does not provide an explicit statement or link for the open-source code of the methodology described in the paper (e.g., their Memory Network implementation).
Open Datasets Yes All our tasks involve a restaurant reservation system, where the goal is to book a table at a restaurant. The first five tasks are generated by a simulation, the last one uses real human-bot dialogs. The data for all tasks is available at http://fb.ai/babi. Table 1 also lists 'Training dialogs' with specific counts.
Dataset Splits Yes Table 1: Data used in this paper. Training dialogs, Validation dialogs, Test dialogs are listed with specific counts for each task, e.g., for Tasks 1-5, Training dialogs: 1,000, Validation dialogs: 1,000, Test dialogs: 1,000.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts used for running its experiments. It does not mention any cloud or cluster resources with specifications.
Software Dependencies No The paper discusses various models and training procedures (e.g., 'trained using stochastic gradient descent (SGD)') and mentions frameworks indirectly by citing papers (e.g., 'Mem N2N architecture of Sukhbaatar et al. (2015)'). However, it does not provide specific software dependencies with version numbers, such as Python versions, deep learning frameworks (e.g., TensorFlow, PyTorch), or CUDA versions.
Experiment Setup Yes Appendix C, titled 'HYPERPARAMETERS', explicitly displays the values of the hyperparameters of the best Supervised Embeddings (Table 8) and Memory Networks (Table 9) selected for each task. These include 'Learning Rate', 'Margin m', 'Embedding Dim d', 'Negative Cand. N', and 'Nb Hops'.