reproducibilityindex.ai

Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation

Authors: Tomoyuki Kajiwara, Biwa Miura, Yuki Arase8042-8049

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.
Researcher Affiliation	Collaboration	Tomoyuki Kajiwara,1 Biwa Miura,2 Yuki Arase3 1Institute for Datability Science, Osaka University, 2AI Samurai Inc., 3Graduate School of Information Science and Technology, Osaka University kajiwara@ids.osaka-u.ac.jp, miura@aisamurai.co.jp, arase@ist.osaka-u.ac.jp
Pseudocode	No	The paper describes its methods in text and provides figures, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper links to several third-party tools and datasets (e.g., Moses toolkit, Sockeye, XLNet, sacreBLEU, GYAFC corpus, WMT-2017 En-De translation task), but it does not provide a link to the authors' own implementation code or explicitly state that their code is open-source.
Open Datasets	Yes	We evaluate the performance of the proposed methods with the GYAFC (Rao and Tetreault 2018), as shown in Table 2. ... For roundtrip translation in pre-training, we used the SAN model with the same setting as the paraphrase generation model. ... We used the dataset of WMT-2017 En-De translation task (Bojar et al. 2017) for our machine translators.
Dataset Splits	Yes	The GYAFC corpus ... Its development (Dev) and test (Test) sets are multi-referenced... Additionally, a raw corpus for pre-training was constructed from the Yahoo Answers L6 corpus. ... We extracted 3 million sentences as the training set and 3, 000 sentences of the development set for each domain. ... For the Dev and Test sets, we used 2, 999 sentence pairs of newstest-2016 and 3, 004 sentence pairs of newstest-2017, respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications).
Software Dependencies	No	The paper mentions several software tools and libraries (e.g., Moses toolkit, Sockeye toolkit, XLNet, sacreBLEU) but does not consistently provide specific version numbers for these dependencies to ensure full reproducibility. For Sockeye, it mentions "Sockeye s arxiv 1217 branch" but this is not a clear version number for all software dependencies.
Experiment Setup	Yes	Our RNN model uses a 4-layer long short-term memory of 1, 024 hidden dimensions for both the encoder and decoder, and multi-layer perceptron attention with a layer size of 1, 024. Our CNN model uses 8 layers in the encoder and decoder, where the hidden dimensions were set to 512. Its convolutional kernel size was set to 3. Our SAN model uses a 6-layer transformer with a model size of 512 and 8 attention heads. We used word embeddings in 512 dimensions tying the source, target, and the output layer s weight matrix. We added dropout to all embeddings and hidden layers. In addition, we applied layer-normalization and label-smoothing as regularization. All models were optimized using the Adam optimizer. The batch size was 4, 096 tokens. We created a checkpoint for the model at every 200 updates. The training stopped after 32 checkpoints without improvement in the validation perplexity.