reproducibilityindex.ai

Paraphrase Diversification Using Counterfactual Debiasing

Authors: Sunghyun Park, Seung-won Hwang, Fuxiang Chen, Jaegul Choo, Jung-Woo Ha, Sunghun Kim, Jinyeong Yim6883-6891

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The problem of generating a set of diverse paraphrase sentences while (1) not compromising the original meaning of the original sentence, and (2) imposing diversity in various semantic aspects, such as a lexical or syntactic structure, is examined. Existing work on paraphrase generation has focused more on the former, and the latter was trained as a fixed style transfer, such as transferring from positive to negative sentiments, even at the cost of losing semantics. In this work, we consider style transfer as a means of imposing diversity, with a paraphrasing correctness constraint that the target sentence must remain a paraphrase of the original sentence. However, our goal is to maximize the diversity for a set of k generated paraphrases, denoted as the diversified paraphrase (DP) problem. Our key contribution is deciding the style guidance at generation towards the direction of increasing the diversity of output with respect to those generated previously. As pre-materializing training data for all style decisions is impractical, we train with biased data, but with debiasing guidance. Compared to state-of-the-art methods, our proposed model can generate more diverse and yet semantically consistent paraphrase sentences. That is, our model, trained with the MSCOCO dataset, achieves the highest embedding scores, .94/.95/.86, similar to state-of-the-art results, but with a lower m BLEU score (more diverse) by 8.73%. 4 Experimental Setup 5 Results Table 1 shows the quantitative evaluation of the different models. Table 3, we report the user study results.
Researcher Affiliation	Collaboration	Sunghyun Park,1,2 Seung-won Hwang,1* Fuxiang Chen,2,4 Jaegul Choo,3 Jung-Woo Ha,2 Sunghun Kim,2,4 Jinyeong Yim2 1Yonsei University 2Clova AI Research, NAVER 3Korea University 4Hong Kong University of Science and Technology
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct statement about the release of source code or a link to a code repository for their method.
Open Datasets	Yes	We used multiple paraphrase datasets for training/development/testing (Quora 2018; Bowman et al. 2015; Dolan, Brockett, and Quirk 2005; Lin et al. 2014; Coster and Kauchak 2011). These datasets are also widely used in previous paraphrase generation work (Prakash et al. 2016; Gupta et al. 2018; Brad and Rebedea 2017). Quora: Released by Quora in 2017, this dataset contains question pairs (asked in Quora) that are paraphrases of one another. Microsoft: Released by Microsoft in 2005, this dataset consists of 5,800 pairs of sentences extracted from online news sources that are annotated as paraphrases. SNLI: Released by the Stanford NLP Group (Stanford 2018) in 2015, this dataset contains 570K of human-written English sentence pairs that are human-labeled as entailment, contradiction, and neutral. MSCOCO: Released by COCO Consortium (COCO 2018) in 2017, this dataset consists of 123K images that are human-annotated with five annotations per image (Lin et al. 2014). Wikipedia: Released by Coster and Kauchak (Coster and Kauchak 2011) in 2011, this dataset consists of 137K of sentences from Wikipedia and its simplified form.
Dataset Splits	Yes	For each of the above six datasets, we randomly split the dataset into training/development/testing, following the distribution of previous studies (Gupta et al. 2018; Patro et al. 2018). The trained models are tuned based on the development dataset.
Hardware Specification	No	The paper states, 'We implemented our models and experimented using NSML (Kim et al. 2018).' However, it does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions using 'NSML (Kim et al. 2018)' which is a platform, and 'bidirectional-GRU encoders (Chung et al. 2014)'. However, it does not specify version numbers for any software components or libraries.
Experiment Setup	Yes	In our decoder networks, we use a beam search, in which the beam size is set to 10, following the same setting as the previous SOTA model (Gupta et al. 2018), and we did not perform any additional reranking. All models are configured consistently to produce the top eight generated paraphrases.