Exploring Personalized Neural Conversational Models

Authors: Satwik Kottur, Xiaoyu Wang, Vitor Carvalho

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleaning, diversity-based reranking, evaluation setting, etc.
Researcher Affiliation Collaboration 1Carnegie Mellon University, Pittsburgh, PA 2Snap Inc., Venice, CA
Pseudocode No The paper describes model components using mathematical equations and text, but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not include an explicit statement or link indicating that the authors' implementation code for the described methodology is publicly available.
Open Datasets Yes Movie-Di C dataset: The Movie-Di C dataset [Banchs, 2012] was collected through The Internet Movie Script Data Collection (IMSDb)1, which contains publicly available movie subtitles. ... TV-Series dataset: We use freely available transcripts for two American television comedy shows, Friends2 and The Big Bang Theory3, to construct our TV-Series dataset. ... Sub Tle dataset The third dataset we consider is the Sub Tle [Ameixa et al., 2014], which is an enormous, non-dialog corpus extracted from movie subtitles.
Dataset Splits Yes Next, we split the dataset into three non-overlapping partitions train (80%), validation (10%) and test (10%)
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using the "deep learning framework Torch [Collobert et al., 2011]" and "NLTK library [Bird et al., 2009]" but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We use 2-layered Gated Recurrent Unit (GRU) for both the encoder and decoder with a dropout [Srivastava et al., 2014] of 0.2. The parameters of the network are learnt through standard back-propagation algorithm with adam optimizer [Kingma and Ba, 2014]. The learning rate is set to 0.001 and is decayed exponentially to 0.0001 by the end of 10 epochs... Word embeddings size is 300, Speaker embedding size is 50, Number of hidden units for encoder/decoder GRU is 300, Number of hidden units for context GRU is 50