reproducibilityindex.ai

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Authors: Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show signiﬁcant improvements over a variety of baselines.
Researcher Affiliation	Industry	Google Research {leishu, luolc, jayakumar, yunzhu, canoee, simon, jdchen, leimeng}@google.com
Pseudocode	No	The paper describes its methods verbally but does not include any explicit pseudocode blocks, algorithms, or flowcharts labeled as such.
Open Source Code	Yes	Github: https://github.com/google-research/google-research/ tree/master/rewritelm
Open Datasets	Yes	To facilitate this research, we introduce OPENREWRITEEVAL, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions. and A OPENREWRITEEVAL Data Data Sources. The source texts for the DFormality, DParaphrase, DShorten, and DElaborate categories are from various datasets, including Multi-News (Fabbri et al. 2019), Wikipedia (Guo et al. 2020), PG-19 book (Rae et al. 2019)...
Dataset Splits	No	The paper discusses 'training data' and an 'evaluation framework' (OPENREWRITEEVAL), and mentions 'validation' in the context of reward model training. However, it does not provide explicit train/validation/test dataset split percentages, sample counts, or references to predefined splits for its own experimental setup.
Hardware Specification	Yes	We use 64 Tensor Processing Units (TPU) V3 chips for ﬁnetuning.
Software Dependencies	No	The paper mentions the 'Adafactor optimizer' but does not specify version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup	Yes	The batch size is 32, and the maximum training step is 5000. We use the Adafactor optimizer (Shazeer and Stern 2018) with a learning rate of 0.003. Both the input and output sequence lengths are set to 1024 tokens. The training dropout rate is 0.1. During inference, the temperature is set to 0.5, and the top-K value is 40.