Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues
Authors: Youngsoo Jang, Jongmin Lee, Kee-Eung Kim7994-8001
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we demonstrate that our Bayes-adaptive dialogue planning agent significantly outperforms the state-of-the-art in a negotiation dialogue domain. |
| Researcher Affiliation | Academia | Youngsoo Jang,1 Jongmin Lee,1 Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Graduate School of AI, KAIST, Daejeon, Republic of Korea {ysjang, jmlee}@ai.kaist.ac.kr, kekim@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Bayes-Adaptive Dialogue Planning (BADP) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, an explicit code release statement, or mention of code in supplementary materials. |
| Open Datasets | Yes | We use human-human negotiation dialogues as the pretraining data collected by Lewis et al. (2017). |
| Dataset Splits | No | The paper states, "For the policy improvement, we used the 12258 dialogues from the selfplay with planning as the training data for each supervised learning step," but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'RNN-based dialogue generation' and 'attention-based sequence-to-sequence RNN model', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions) needed to replicate the experiment. |
| Experiment Setup | Yes | For BADP, we use an exploration constant for UCT of 5, the number of actions for each node of 15, and the number of simulations of 300. |