reproducibilityindex.ai

Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue

Authors: Byeongchang Kim, Jaewoo Ahn, Gunhee Kim

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018).
Researcher Affiliation	Academia	Byeongchang Kim Jaewoo Ahn Gunhee Kim Department of Computer Science and Engineering Seoul National University, Seoul, Korea {byeongchang.kim,jaewoo.ahn}@vision.snu.ac.kr gunhee@snu.ac.kr
Pseudocode	No	The paper describes the model architecture and equations in prose and mathematical notation but does not include a distinct pseudocode or algorithm block.
Open Source Code	No	The paper mentions making 'new set of GT annotations available in the project page' for one dataset and cites a BERT repository used for a baseline, but it does not explicitly state that the source code for their proposed model (SKT) is being released or provide a link to it.
Open Datasets	Yes	As a main testbed of our research, we choose the Wizard of Wikipedia (WoW) benchmark (Dinan et al., 2019)... In our experiments, we also evaluate on Holl-E (Moghe et al., 2018) as another dataset for knowledge-grounded dialogue, after collecting clearer labels of knowledge sentences. We make our new set of GT annotations available in the project page.
Dataset Splits	Yes	Wizard of Wikipedia. It contains 18,430 dialogues for training, 1,948 dialogues for validation and 1,933 dialogues for test. The test set is split into two subsets, Test Seen and Test Unseen. Holl-E. It contains 7,228 dialogues for training, 930 dialogues for validation and 913 dialogues for test.
Hardware Specification	Yes	We train our model up to 5 epochs on two NVIDIA TITAN Xp GPU.
Software Dependencies	No	The paper mentions using Adam optimizer, fastText, and BERT-Base, but it does not provide specific version numbers for these software components or libraries required for replication.
Experiment Setup	Yes	All the parameters except pretrained parts are initialized with Xavier method (Glorot & Bengio, 2010). We use Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999, ϵ = 1e 07. For the models without BERT, we set the learning rate to 0.001... For the models with BERT, we set the learning rate to 0.00002... We apply label smoothing... and set 0.1 and 0.05 for each. We set the temperature of Gumbel-Softmax to τ = 0.1 and the hyperparameter for the knowledge loss to λ = 1.0.