Modeling Dialogues with Hashcode Representations: A Nonparametric Approach
Authors: Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan3970-3979
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators. |
| Researcher Affiliation | Collaboration | 1USC Information Sciences Institute 2IBM T. J. Watson Research Center |
| Pseudocode | Yes | Algorithm 1 Response Generation via Hashing of N-grams |
| Open Source Code | No | The paper does not provide concrete access to open-source code for the described methodology. It mentions an arXiv link for an extended version, but this is not a code repository. |
| Open Datasets | Yes | The three datasets used in our experiments include (1) depression therapy sessions, (2) Larry King TV interviews and (3) Twitter dataset. The depression therapy dataset4 consists of transcribed recordings... The Larry King dataset 5 contains transcripts of interviews... Next, we experimented with the Twitter Dialogue Corpus (Ritter, Cherry, and Dolan 2010). Footnotes provide URLs: 4https://alexanderstreet.com/products/counseling-andpsychotherapy-transcripts-series 5http://transcripts.cnn.com/TRANSCRIPTS/lkl.html |
| Dataset Splits | Yes | We select 10% of the data randomly as a test set (4200 samples), and then perform another random 90/10 split of the remaining 38,000 samples into training and validation subsets, respectively. |
| Hardware Specification | No | The paper mentions running experiments 'on a 1000-core GPU' and 'on a 16-core CPU', but these are general descriptions and do not specify exact models, manufacturers, or other detailed specifications for the hardware used. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The vocabulary size for the input is set via grid search between values 1000 to 100000. The neural network structures are chosen by an informal search over a set of architectures and we set maximum gradient steps to 80, validation frequency to 500 and step-size decay for SGD is 1e-4. |