Deconvolutional Latent-Variable Model for Text Sequence Matching

Authors: Dinghan Shen, Yizhe Zhang, Ricardo Henao, Qinliang Su, Lawrence Carin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments
Researcher Affiliation Academia Dinghan Shen, Yizhe Zhang, Ricardo Henao, Qinliang Su, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Durham, NC 27708 {dinghan.shen, yizhe.zhang, ricardo.henao, qinliang.su, lcarin}@duke.edu
Pseudocode No The paper describes the model architecture and processes verbally and with diagrams, but no structured pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not include an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Dataset Train Test Classes Vocabulary Quora 384348 10000 2 10k SNLI 549367 9824 3 20k Table 1: Summary of text sequence matching datasets. Further, we apply our models to two standard text sequence matching tasks: Recognizing Textual Entailment (RTE) and paraphrase identification, in a semi-supervised setting. The summary statistics of both datasets are presented in Table 1.
Dataset Splits Yes Dropout (Srivastava et al. 2014) is employed on both word embedding and latent variable layers, with rates selected from {0.3, 0.5, 0.8} on the validation set. We set the mini-batch size to 32. In semi-supervised sequence matching experiments, L2 norm of the weight vectors is employed as a regularization term in the loss function, and the coefficient of the L2 loss is treated as a hyperparameter and tuned on the validation set.
Hardware Specification Yes All experiments are implemented in Tensorflow (Abadi et al. 2016), using one NVIDIA Ge Force GTX TITAN X GPU with 12GB memory.
Software Dependencies No All experiments are implemented in Tensorflow (Abadi et al. 2016)... No specific version number for Tensorflow or other software dependencies is provided.
Experiment Setup Yes We use 3-layer convolutional neural networks for the inference/encoder network... for all layers we set the filter window size (W) as 5, with a stride of 2. The feature maps (K) are set as 300, 600, 500, for layers 1 through 3, respectively... The model is trained using Adam (Kingma and Ba 2014) with a learning rate of 3 10 4 for all parameters. Dropout (Srivastava et al. 2014) is employed on both word embedding and latent variable layers, with rates selected from {0.3, 0.5, 0.8} on the validation set. We set the mini-batch size to 32.