Deconvolutional Latent-Variable Model for Text Sequence Matching
Authors: Dinghan Shen, Yizhe Zhang, Ricardo Henao, Qinliang Su, Lawrence Carin
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments |
| Researcher Affiliation | Academia | Dinghan Shen, Yizhe Zhang, Ricardo Henao, Qinliang Su, Lawrence Carin Department of Electrical & Computer Engineering, Duke University Durham, NC 27708 {dinghan.shen, yizhe.zhang, ricardo.henao, qinliang.su, lcarin}@duke.edu |
| Pseudocode | No | The paper describes the model architecture and processes verbally and with diagrams, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Dataset Train Test Classes Vocabulary Quora 384348 10000 2 10k SNLI 549367 9824 3 20k Table 1: Summary of text sequence matching datasets. Further, we apply our models to two standard text sequence matching tasks: Recognizing Textual Entailment (RTE) and paraphrase identification, in a semi-supervised setting. The summary statistics of both datasets are presented in Table 1. |
| Dataset Splits | Yes | Dropout (Srivastava et al. 2014) is employed on both word embedding and latent variable layers, with rates selected from {0.3, 0.5, 0.8} on the validation set. We set the mini-batch size to 32. In semi-supervised sequence matching experiments, L2 norm of the weight vectors is employed as a regularization term in the loss function, and the coefficient of the L2 loss is treated as a hyperparameter and tuned on the validation set. |
| Hardware Specification | Yes | All experiments are implemented in Tensorflow (Abadi et al. 2016), using one NVIDIA Ge Force GTX TITAN X GPU with 12GB memory. |
| Software Dependencies | No | All experiments are implemented in Tensorflow (Abadi et al. 2016)... No specific version number for Tensorflow or other software dependencies is provided. |
| Experiment Setup | Yes | We use 3-layer convolutional neural networks for the inference/encoder network... for all layers we set the filter window size (W) as 5, with a stride of 2. The feature maps (K) are set as 300, 600, 500, for layers 1 through 3, respectively... The model is trained using Adam (Kingma and Ba 2014) with a learning rate of 3 10 4 for all parameters. Dropout (Srivastava et al. 2014) is employed on both word embedding and latent variable layers, with rates selected from {0.3, 0.5, 0.8} on the validation set. We set the mini-batch size to 32. |