GAPX: Generalized Autoregressive Paraphrase-Identification X

Authors: Yifei Zhou, Renyu Li, Hayden Housen, Ser Nam Lim

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our findings with strong empirical results. 1 Our experiments are designed to (1) verify that the task of paraphrase identification suffers from biases in the datasets that is the main obstacle to generalization in this field of study, (2) test the accuracy of our perplexity based out-of-distribution detection method, and (3) test that balancing the utilization of the negative model can help outperform the state-of-the-art in the face of distribution shift, without losing in the in-distribution scenarios.
Researcher Affiliation Collaboration Yifei Zhou Cornell University yz639@cornell.edu Renyu Li Cornell University rl626@cornell.edu Hayden Housen Cornell University hth33@cornell.edu Ser-nam Lim Meta AI sernamlim@fb.com
Pseudocode No The paper describes its methodology in detail using mathematical formulations and textual explanations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Our code is publicly available at: https://github.com/Yifei Zhou02/generalized_paraphrase_identification
Open Datasets Yes We compare our method against the other state-of-the-art methods on different combinations of the following datasets: Quora Question Pair (QQP), World Machine Translation Metrics Task 2017 (WMT) [8], Paraphrase and Semantic Similarity in Twitter (PIT) [53, 54], Paraphrase Adversarials from Word Scrambling (PAWS) [62].
Dataset Splits Yes To scale QQP down to approximately the same size of PAWS and PIT (see below), we take the first 10k training pairs and 2k testing pairs from the train and test split by Wang et al. [52]. ... Hence, we use the original development data of size 1896 as the test set while keeping original test set of size 350 for development. The training set contains 5332 sentence pairs.
Hardware Specification Yes All experiments are run on Nvidia 2080 Ti with 11 GB memory.
Software Dependencies No The paper mentions pretrained models like BART and BERT and optimizers, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes To optimize conditional sentence generators, we use Adam optimizer with learning rate 2e-5. We adopt cross entropy loss for each word logit.