Conversational Model Adaptation via KL Divergence Regularization

Authors: Juncen Li, Ping Luo, Fen Lin, Bo Chen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also evaluate the performance of this adaptation model for the online chatbots in Wechat platform of public accounts using both the BLEU metric and human judgement. The experiments empirically show that the proposed method visibly improves these evaluation metrics.
Researcher Affiliation Collaboration Juncen Li,1 Ping Luo,2,3 Fen Lin,1 Bo Chen1 1We Chat Search Application Department, Tencent, China. {juncenli}@tencent.com, {felicialin,jennychen}@tencent.com 2Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China. {luop}@ics.ict.ac.cn 3University of Chinese Academy of Sciences, Beijing 100049, China.
Pseudocode Yes The framework of our method is shown in Algorithm 1. Algorithm 1 KLD REGULARIZED ADAPTATION(Ds, Dt)
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We collect source domain data from Tencent Weibo using a similar method described in (Wang et al. 2013). The source domain data includes 1,903,512 pairs and 26,814 words.
Dataset Splits Yes We randomly divide this data into training, validation and test set with no overlap posts. (...) Table 1: Data statistics (...) Training pairs 12,322 Validation pairs 1,449 Test posts 1000
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software components like GRU (Cho et al. 2014b) and Adadelta (Zeiler 2012) but does not provide specific version numbers for any libraries or frameworks.
Experiment Setup Yes For the RNN-based encoder-decoder model, we use 1-layer GRU with 512 cells for both the encoder and the decoder. Word embeddings are treated separately for the encoder and the decoder as suggested in (Shang, Lu, and Li 2015). Embedding dimensions are set to 128. All parameters are initialized with the uniform distribution between -0.1 and 0.1. The activation function we use is maxout which can effectively avoid overfitting (Goodfellow et al. 2013). We use Adadelta (Zeiler 2012) in training and a minibatch size of 128. (...) For the adaptation method , we set regularization weight α to 0.5.