Modeling Dialogues with Hashcode Representations: A Nonparametric Approach

Authors: Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan3970-3979

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.
Researcher Affiliation Collaboration 1USC Information Sciences Institute 2IBM T. J. Watson Research Center
Pseudocode Yes Algorithm 1 Response Generation via Hashing of N-grams
Open Source Code No The paper does not provide concrete access to open-source code for the described methodology. It mentions an arXiv link for an extended version, but this is not a code repository.
Open Datasets Yes The three datasets used in our experiments include (1) depression therapy sessions, (2) Larry King TV interviews and (3) Twitter dataset. The depression therapy dataset4 consists of transcribed recordings... The Larry King dataset 5 contains transcripts of interviews... Next, we experimented with the Twitter Dialogue Corpus (Ritter, Cherry, and Dolan 2010). Footnotes provide URLs: 4https://alexanderstreet.com/products/counseling-andpsychotherapy-transcripts-series 5http://transcripts.cnn.com/TRANSCRIPTS/lkl.html
Dataset Splits Yes We select 10% of the data randomly as a test set (4200 samples), and then perform another random 90/10 split of the remaining 38,000 samples into training and validation subsets, respectively.
Hardware Specification No The paper mentions running experiments 'on a 1000-core GPU' and 'on a 16-core CPU', but these are general descriptions and do not specify exact models, manufacturers, or other detailed specifications for the hardware used.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The vocabulary size for the input is set via grid search between values 1000 to 100000. The neural network structures are chosen by an informal search over a set of architectures and we set maximum gradient steps to 80, validation frequency to 500 and step-size decay for SGD is 1e-4.