Adapting Language Models for Non-Parallel Author-Stylized Rewriting

Authors: Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu Natarajan, Vasudeva Varma9008-9015

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the efficacy of our approach, we propose a linguisticallymotivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines. We evaluate our performance against 4 baselines 3 of which are trained on non-parallel data, while the 4th one uses parallel data. The results for stylized rewriting of the test corpus to the various author s style (10 in total) are presented in in Table 2.
Researcher Affiliation Collaboration IIIT Hyderabad, Adobe Research syed.b@research.iiit.ac.in {gaverma, balsrini, anandvn}@adobe.com vv@iiit.ac.in
Pseudocode No The paper describes the proposed model and its training process in textual form and through a diagram (Figure 2), but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct link to open-source code for the methodology described, nor does it explicitly state that the code will be made publicly available.
Open Datasets Yes We collated a subset of the Gutenberg corpus (Lahiri 2014) consisting of 142 authors and 2, 857 books written by them. To diversify the pre-training dataset, we use 1 million passages from Wikipedia (Radford et al. 2018) along with 3.6M passages from the Gutenberg corpus leading to a total of 4.6M passages for pre-training the LM.
Dataset Splits Yes Of these, we set aside 5000 passages for validation and 5000 for test during the pre-training stage.
Hardware Specification No The paper mentions training 'on a GPU' in general terms within the related work section about supervised approaches ('using readily available classification-based discriminators to guide the process of generation (Fu et al. 2018)'). However, for its own experimental setup, it does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions several software components and algorithms like 'Transformer encoder', 'GELU activations', 'Adam optimizer', 'Byte Pair Encoding (BPE)', and 'multi-bleu-detok.perl' for evaluation. However, it does not specify any version numbers for these or any other software dependencies, which is required for reproducible description.
Experiment Setup Yes During pre-training with MLM, we use the Transformer encoder (Vaswani et al. 2017) (12-layer) with GELU activations (Hendrycks and Gimpel 2017), 512 hidden units, 16 heads, a dropout rate of 0.1 and learned positional embeddings. We train our models with the Adam optimizer (Kingma and Ba 2014), and a learning rate of 10 4 . We use streams of 256 tokens and a mini-batches of size 32. (...) pdrop and pblank are set to 0.1 and the model is fine-tuned until convergence.