RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Authors: Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show significant improvements over a variety of baselines.
Researcher Affiliation Industry Google Research {leishu, luolc, jayakumar, yunzhu, canoee, simon, jdchen, leimeng}@google.com
Pseudocode No The paper describes its methods verbally but does not include any explicit pseudocode blocks, algorithms, or flowcharts labeled as such.
Open Source Code Yes Github: https://github.com/google-research/google-research/ tree/master/rewritelm
Open Datasets Yes To facilitate this research, we introduce OPENREWRITEEVAL, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions. and A OPENREWRITEEVAL Data Data Sources. The source texts for the DFormality, DParaphrase, DShorten, and DElaborate categories are from various datasets, including Multi-News (Fabbri et al. 2019), Wikipedia (Guo et al. 2020), PG-19 book (Rae et al. 2019)...
Dataset Splits No The paper discusses 'training data' and an 'evaluation framework' (OPENREWRITEEVAL), and mentions 'validation' in the context of reward model training. However, it does not provide explicit train/validation/test dataset split percentages, sample counts, or references to predefined splits for its own experimental setup.
Hardware Specification Yes We use 64 Tensor Processing Units (TPU) V3 chips for finetuning.
Software Dependencies No The paper mentions the 'Adafactor optimizer' but does not specify version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes The batch size is 32, and the maximum training step is 5000. We use the Adafactor optimizer (Shazeer and Stern 2018) with a learning rate of 0.003. Both the input and output sequence lengths are set to 1024 tokens. The training dropout rate is 0.1. During inference, the temperature is set to 0.5, and the top-K value is 40.