Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder
Authors: Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, Sunghun Kim
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two popular datasets show that Dialog WAE outperforms the state-of-the-art approaches in generating more coherent, informative and diverse responses. |
| Researcher Affiliation | Collaboration | Hong Kong University of Science and Technology, New York Universidy, Clova AI Research, NAVER |
| Pseudocode | Yes | Algorithm 1: Dialog WAE Training (UEnc: utterance encoder; CEnc: context encoder; Rec Net: recognition network; Pri Net: prior network; Dec: decoder) K=3, ncritic=5 in all experiments |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our model on two dialogue datasets, Dailydialog (Li et al., 2017b) and Switchboard (Godfrey and Holliman, 1997), which have been widely used in recent studies (Shen et al., 2018; Zhao et al., 2017). |
| Dataset Splits | Yes | The datasets are separated into training, validation, and test sets with the same ratios as in the baseline papers, that is, 2316:60:62 for Switchboard (Zhao et al., 2017) and 10:1:1 for Dailydialog (Shen et al., 2018), respectively. |
| Hardware Specification | No | The paper mentions that models are 'fine-tuned with NAVER Smart Machine Learning (NSML) platform', but does not specify any hardware details like CPU, GPU models, or memory for the experimental setup. |
| Software Dependencies | Yes | All the models are implemented with Pytorch 0.4.03, and fine-tuned with NAVER Smart Machine Learning (NSML) platform (Sung et al., 2017; Kim et al., 2018). |
| Experiment Setup | Yes | The models are trained with mini-batches containing 32 examples each in an end-to-end manner. In the AE phase, the models are trained by SGD with an initial learning rate of 1.0 and gradient clipping at 1 (Pascanu et al., 2013). We decay the learning rate by 40% every 10th epoch. In the GAN phase, the models are updated using RMSprop (Tieleman and Hinton) with fixed learning rates of 5 10 5 and 1 10 5 for the generator and the discriminator, respectively. We tune the hyper-parameters on the validation set and measure the performance on the test set. |