reproducibilityindex.ai

PARABANK: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-Constrained Neural Machine Translation

Authors: J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme6521-6528

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present PARABANK, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of PARANMT (Wieting and Gimpel, 2018), we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. ... Using human judgments, we also demonstrate that PARABANK s paraphrases improve over PARANMT on both semantic similarity and ﬂuency.
Researcher Affiliation	Academia	J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme 3400 North Charles Street Johns Hopkins University Baltimore, MD, USA
Pseudocode	No	The paper describes the methods in narrative text and does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Using PARABANK, we train, and release to the public, a monolingual sentence re-writing system, which may be used to paraphrase unseen English sentences with lexical constraints. ... PARABANK is available for download at: http://nlp.jhu.edu/parabank. ... In addition to releasing hundreds of millions of English sentential paraphrases, we also release a free, pre-trained, model for monolingual sentential rewriting, as trained on PARABANK.
Open Datasets	Yes	The training data, Cz Eng 1.7 (Bojar et al., 2016)... We apply the same pipeline to the 109 word French English parallel corpus (Giga) (Callison-Burch et al., 2009).
Dataset Splits	No	The paper discusses data sampling for human evaluation ('We randomly sampled 100 Czech-English sentence pairs from each of the four English token lengths...') but does not provide explicit details about the train/validation/test splits used for training the neural machine translation model itself from the Cz Eng 1.7 or Giga corpora.
Hardware Specification	Yes	We trained the model on 2 Nvidia GTX 1080Ti for two weeks.
Software Dependencies	No	The paper mentions software tools like Sockeye (Hieber et al., 2017), spaCy (Honnibal and Montani, 2017), and MorphoDiTa (Straková, Straka, and Hajíč, 2014) along with their respective citation years, but does not provide specific version numbers for these software dependencies as required for reproducibility.
Experiment Setup	Yes	The model s encoder and decoder are both 6-layer LSTMs with a hidden size of 1024 and an embedding size of 512. Additionally, the model has one dot-attention layer.