reproducibilityindex.ai

Modeling Dialogues with Hashcode Representations: A Nonparametric Approach

Authors: Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan3970-3979

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach signiﬁcantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efﬁciency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.
Researcher Affiliation	Collaboration	1USC Information Sciences Institute 2IBM T. J. Watson Research Center
Pseudocode	Yes	Algorithm 1 Response Generation via Hashing of N-grams
Open Source Code	No	The paper does not provide concrete access to open-source code for the described methodology. It mentions an arXiv link for an extended version, but this is not a code repository.
Open Datasets	Yes	The three datasets used in our experiments include (1) depression therapy sessions, (2) Larry King TV interviews and (3) Twitter dataset. The depression therapy dataset4 consists of transcribed recordings... The Larry King dataset 5 contains transcripts of interviews... Next, we experimented with the Twitter Dialogue Corpus (Ritter, Cherry, and Dolan 2010). Footnotes provide URLs: 4https://alexanderstreet.com/products/counseling-andpsychotherapy-transcripts-series 5http://transcripts.cnn.com/TRANSCRIPTS/lkl.html
Dataset Splits	Yes	We select 10% of the data randomly as a test set (4200 samples), and then perform another random 90/10 split of the remaining 38,000 samples into training and validation subsets, respectively.
Hardware Specification	No	The paper mentions running experiments 'on a 1000-core GPU' and 'on a 16-core CPU', but these are general descriptions and do not specify exact models, manufacturers, or other detailed specifications for the hardware used.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	The vocabulary size for the input is set via grid search between values 1000 to 100000. The neural network structures are chosen by an informal search over a set of architectures and we set maximum gradient steps to 80, validation frequency to 500 and step-size decay for SGD is 1e-4.