reproducibilityindex.ai

Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues

Authors: Youngsoo Jang, Jongmin Lee, Kee-Eung Kim7994-8001

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we demonstrate that our Bayes-adaptive dialogue planning agent significantly outperforms the state-of-the-art in a negotiation dialogue domain.
Researcher Affiliation	Academia	Youngsoo Jang,1 Jongmin Lee,1 Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Graduate School of AI, KAIST, Daejeon, Republic of Korea {ysjang, jmlee}@ai.kaist.ac.kr, kekim@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 Bayes-Adaptive Dialogue Planning (BADP)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets	Yes	We use human-human negotiation dialogues as the pretraining data collected by Lewis et al. (2017).
Dataset Splits	No	The paper states, "For the policy improvement, we used the 12258 dialogues from the selfplay with planning as the training data for each supervised learning step," but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'RNN-based dialogue generation' and 'attention-based sequence-to-sequence RNN model', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions) needed to replicate the experiment.
Experiment Setup	Yes	For BADP, we use an exploration constant for UCT of 5, the number of actions for each node of 15, and the number of simulations of 300.