Taming Continuous Posteriors for Latent Variational Dialogue Policies
Authors: Marin Vlastelica, Patrick Ernst, Gyuri Szarvas
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using the Multi WOZ benchmark (Budzianowski et al. 2018), we show that TCUP is able to improve the state-of-the-art performance across different metrics.We provide a detailed evaluation of TCUP’s dialogue policy in Sec. 4.1. This includes comparing its performance on the Multi WOZ benchmark; an ablation study to assess the importance of our technical contributions from Sec. 3; and a qualitative analysis of response coherence. |
| Researcher Affiliation | Collaboration | 1Autonomous Learning Group, Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon Development Center Germany Gmb H, Berlin, Germany |
| Pseudocode | No | The paper describes its methods but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Using the Multi WOZ benchmark (Budzianowski et al. 2018)We evaluate TCUP using Multi WOZ 2.1 (Wang et al. 2020) on the policy learning task for context to response generation.Multi WOZ contains 10438 dialogues across six different domains, pre-split into 8438 training, 1000 validation, and 1000 testing records. |
| Dataset Splits | Yes | Multi WOZ contains 10438 dialogues across six different domains, pre-split into 8438 training, 1000 validation, and 1000 testing records. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper describes the models and frameworks used (e.g., recurrent encoder-decoder architecture, variational inference) but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper describes the training stages, loss functions, and some methodological choices (e.g., weighted cross-entropy, batched policy gradient, replay buffer with probability λ). However, it does not provide specific hyperparameter values (like learning rate, batch size, number of epochs, or optimizer settings) in the main text. It mentions that a sensitivity analysis for λ is in the Appendix, but those values are not present here. |