reproducibilityindex.ai

Taming Continuous Posteriors for Latent Variational Dialogue Policies

Authors: Marin Vlastelica, Patrick Ernst, Gyuri Szarvas

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using the Multi WOZ benchmark (Budzianowski et al. 2018), we show that TCUP is able to improve the state-of-the-art performance across different metrics.We provide a detailed evaluation of TCUP’s dialogue policy in Sec. 4.1. This includes comparing its performance on the Multi WOZ benchmark; an ablation study to assess the importance of our technical contributions from Sec. 3; and a qualitative analysis of response coherence.
Researcher Affiliation	Collaboration	1Autonomous Learning Group, Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon Development Center Germany Gmb H, Berlin, Germany
Pseudocode	No	The paper describes its methods but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	Yes	Using the Multi WOZ benchmark (Budzianowski et al. 2018)We evaluate TCUP using Multi WOZ 2.1 (Wang et al. 2020) on the policy learning task for context to response generation.Multi WOZ contains 10438 dialogues across six different domains, pre-split into 8438 training, 1000 validation, and 1000 testing records.
Dataset Splits	Yes	Multi WOZ contains 10438 dialogues across six different domains, pre-split into 8438 training, 1000 validation, and 1000 testing records.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper describes the models and frameworks used (e.g., recurrent encoder-decoder architecture, variational inference) but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	No	The paper describes the training stages, loss functions, and some methodological choices (e.g., weighted cross-entropy, batched policy gradient, replay buffer with probability λ). However, it does not provide specific hyperparameter values (like learning rate, batch size, number of epochs, or optimizer settings) in the main text. It mentions that a sensitivity analysis for λ is in the Appendix, but those values are not present here.