PCVAE: Generating Prior Context for Dialogue Response Generation
Authors: Zefeng Cai, Zerui Cai
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that PCVAE can generate distinct responses and significantly outperforms strong baselines. |
| Researcher Affiliation | Academia | Zefeng Cai , Zerui Cai East China Normal University oklen@foxmail.com, zrcai flow@126.com |
| Pseudocode | No | The paper describes algorithms using prose and mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We employ two authoritative datasets for our experiment, including Multi Woz [Zang et al., 2020] for crossdomain task-oriented dialogue and Cornell Movie [Danescu Niculescu-Mizil and Lee, 2011] for open-domain dialogue. |
| Dataset Splits | No | The paper mentions training stops when 'worse valid loss is obtained in the validation phase', indicating a validation set is used. However, it does not provide specific details on the dataset splits (e.g., percentages or exact counts for train/validation/test sets) for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. It only details model architecture and training hyperparameters. |
| Software Dependencies | No | The paper mentions using 'Glove embedding' and 'NLTK tokenizer [Bird et al., 2009]' but does not provide specific version numbers for these or other software libraries (e.g., deep learning frameworks like PyTorch or TensorFlow). |
| Experiment Setup | Yes | We use word embeddings with 200 dimensions and hidden states with 300 dimensions for encoding and decoding GRU. [...] The number of layers ND of MLPs for compression and reconstruction is set to 2 with hidden sizes ranging from 200 to 300. We use NK = 4 codebooks and NE = 8192 codewords. The γm used in moving mean predicting is set to 0.95. The β used in vector quantization is set to 0.25. In training, we use batch size NB = 192 and Adam optimizer with an initial learning rate of 1e-3 for both of the datasets. We decrease the learning ratio by 0.8 when the worse valid loss is obtained in the validation phase and stop training as the learning rate is down to 1e-5. |