Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning

Authors: Jishnu Ray Chowdhury, Yong Zhuang, Shuyi Wang10535-10544

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By conducting extensive experiments on four datasets, we demonstrate the effectiveness of the proposed approaches for retaining the semantic content of the original text while inducing lexical novelty in the generation.
Researcher Affiliation Collaboration 1 University of Illinois, at Chicago 2 Bloomberg jraych2@uic.edu, yzhuang52@bloomberg.net, swang1072@bloomberg.net
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'the official code for Lo RA.7' which refers to a third-party library, not the authors' own implementation code. There is no explicit statement or link provided for the source code of their proposed methods.
Open Datasets Yes Quora Question Pairs 50K split (QQP 50K)2: Quora Question Paris (QQP) is a paraphrase detection dataset. We only use the true paraphrase pairs. We use the 50K dataset split as used in Gupta et al. (2018).3; Microsoft Research Paraphrase Corpus (MSRPC): MSRPC (Dolan, Quirk, and Brockett 2004) is another paraphrase detection corpus.; Para SCI-ACL: Para SCI-ACL (Dong, Wan, and Cao 2021) is a paraphrase generation dataset in the scientific domain. We use the official split.5
Dataset Splits Yes Details of dataset split sizes are presented in Table 3. Dataset Name Training Validation Test QQP 50K 46,000 4,000 4,000
Hardware Specification Yes The models are trained and tuned on single Tesla V100 32GB GPUs.
Software Dependencies No The paper mentions specific software components like 'Adam W', 'Transformers library (Wolf et al. 2020)', 'sentence-transformers', and 'Lo RA' but does not provide specific version numbers for these.
Experiment Setup Yes We tune the hyperparameters on QQP 50K with GPT2 medium for all the approaches. We search the learning rate within {0.1, 0.01, 1e 3, 1e 4, 5e 5}. For adapter tuning, we search the adapter bottleneck hidden state dimension within {128, 256, 512}. For Lo RA, LPT, RAPT, and NC-RAPT (all approached involving Lo RA), we fix r (matrix rank) as 8. We also use a weight decay of 0.01 for Lo RA-based methods. We set the infix length for all prompt tuning methods to 8. We search the prefix length of prompt tuning random, prefix tuning, and prefix-layer tuning within {8, 64, 256}. In all cases, we use Adam W (Loshchilov and Hutter 2019) as the optimizer. We also use a linear schedule with warmup for 100 steps, a gradient norm clipping with a maximum of 1, a batch size of 32, and a maximum decoding length of n+100. We set the early stopping patience as 3. Model selection during training is done based on validation loss.