A Deep Generative Framework for Paraphrase Generation
Authors: Ankush Gupta, Arvind Agarwal, Prawaan Singh, Piyush Rai
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative evaluation of the proposed method on a benchmark paraphrase dataset demonstrates its efficacy, and its performance improvement over the state-of-the-art methods by a significant margin, whereas qualitative human evaluation indicate that the generated paraphrases are well-formed, grammatically correct, and are relevant to the input sentence. Furthermore, we evaluate our method on a newly released question paraphrase dataset, and establish a new baseline for future research. |
| Researcher Affiliation | Collaboration | Ankush Gupta, Arvind Agarwal {ankushgupta,arvagarw}@in.ibm.com IBM Research Labs New Delhi, India Prawaan Singh, Piyush Rai prawaan@iitk.ac.in and piyush@cse.iitk.ac.in Indian Institute of Technology, Kanpur, India |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions links to external software used for evaluation (https://github.com/jhclark/multeval) and a borrowed implementation for experimental setup (https://github.com/kefirski/pytorch RVAE), but it does not provide a link or explicit statement about releasing the source code for the methodology described in this paper. |
| Open Datasets | Yes | MSCOCO (Lin et al. 2014): This dataset, also used previously to evaluate paraphrase generation methods (Prakash et al. 2016), contains human annotated captions of over 120K images. Quora: Quora released a new dataset in January 2017. The dataset consists of over 400K lines of potential question duplicate pairs. ... 1https://data.quora.com/First-Quora-Dataset-Release Question-Pairs |
| Dataset Splits | Yes | MSCOCO: The dataset has separate division for training and validation. Train 2014 contains over 82K images and Val 2014 contains over 40K images. Quora: In our experiments, we evaluate our model on 50K, 100K and 150K training dataset sizes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions borrowing settings from a PyTorch implementation (https://github.com/kefirski/pytorch RVAE) but does not provide specific version numbers for PyTorch or any other software libraries or solvers. |
| Experiment Setup | Yes | The dimension of the embedding vector is set to 300, the dimension of both encoder and decoder is 600, and the latent space dimension is 1100. The number of layers in the encoder is 1 and in decoder 2. Models are trained with stochastic gradient descent with learning rate fixed at a value of 5 10 5 with dropout rate of 30%. Batch size is kept at 32. |