Dual-View Variational Autoencoders for Semi-Supervised Text Matching
Authors: Zhongbin Xie, Shuai Ma
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on SNLI, Quora and a Community Question Answering dataset demonstrate the superiority of our DVVAE over several strong semi-supervised and supervised text matching models. |
| Researcher Affiliation | Academia | Zhongbin Xie and Shuai Ma SKLSDE Lab, Beihang University, Beijing, China Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing, China {xiezb, mashuai}@buaa.edu.cn |
| Pseudocode | No | The paper describes the model architecture and implementation but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We experiment on three datasets: SNLI [Bowman et al., 2015] for Natural Language Inference, Quora Question Pairs for Paraphrase Identification, and a Community Question Answering (CQA) dataset [Nakov et al., 2015] for Question Answering. (ii) For the CQA dataset, the original train set is used as Dl, and we additionally adopt Wiki QA [Yang et al., 2015], which has 29k QA pairs, as Du by removing all its labels. |
| Dataset Splits | Yes | We adopt early stopping where performance on dev set is evaluated every time Dl is traversed. Datasets: SNLI 549,367 #Train 9,842 #Dev 9,824 #Test; Quora 384,348 #Train 10,000 #Dev 10,000 #Test; CQA 16,541 #Train 1,645 #Dev 1,976 #Test. For SNLI, we select 5.25%, 10.8% and 22.2% of the original train set to be Dl (i.e., approximately 28k, 59k and 120k labeled pairs), and remove the labels of the remaining data in the train set to make up Du; for Quora, we select 1k, 5k, 10k and 25k labeled pairs in the train set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper states: "Experiments are implemented in Py Torch." However, it does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We set d E = 300 and d Z = 500. Hidden state size d H is set to 300 for both directions. For the decoder, we choose a 3-layer dilated CNN, with dilation rates [1, 2, 4]. In all the bottleneck residual blocks, filter size is set to 3 and channel numbers are set to 300 internally and 600 externally. In the interaction matcher, we adopt a 2-layer CNN with filter sizes 5 5 8 and 3 3 16... α in Equ (5) is set to 20. We set γ = 10 for SNLI, and γ = 20 for the other experiments. SGD with momentum 0.9 and weight decay 1 10 3 is adopted in optimization. We use an initial learning rate of 3 10 3. Batch size is tuned on {32, 64, 128} for each experiment. A dropout rate of 0.1 is used in each layer of the decoder net. |