reproducibilityindex.ai

Improving Open-Ended Text Generation via Adaptive Decoding

Authors: Wenhong Zhu, Hongkun Hao, Zhiwei He, Yiming Ai, Rui Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results reveal that our method balances diversity and coherence well. The human evaluation shows that our method can generate human-preferred text. Additionally, our method can potentially improve the reasoning ability of language models. In our experiments, we performed two open-ended text generation tasks: document continuation and story generation. The results suggest that our approach significantly enhances diversity while preserving coherence in GPT2-XL(1.5B) and Llama2-7B models.
Researcher Affiliation	Academia	Wenhong Zhu 1 Hongkun Hao 1 Zhiwei He 1 Yiming Ai 1 Rui Wang 1 1MT Lab, Department of Computer Science and Engineering , Shanghai Jiao Tong University, Shanghai, China. Correspondence to: Rui Wang <wangrui12@sjtu.edu.cn>.
Pseudocode	Yes	Algorithm 1 Adaptive Decoding Algorithm
Open Source Code	Yes	The code is available at https://github.com/zwhong714/adaptive decoding.
Open Datasets	Yes	Datasets We explore two open-ended text generation applications: document continuation using the Wiki Text-103 dataset (Merity et al., 2017), which contains a large collection of Wikipedia articles. Another is story generation on the Writing Prompts dataset (Fan et al., 2018), a notably challenging endeavor.
Dataset Splits	Yes	We randomly select 1200 data samples from the training set of each dataset, use 1000 samples of them to evaluate different decoding algorithms and use the remaining 200 samples to select hyperparameters.
Hardware Specification	Yes	Table 9. Decoding latency. The computational hardware is NVIDIA RTX 3090, with the model loaded in float16.
Software Dependencies	No	The paper mentions models like GPT2-XL, Llama2-7B, and Llama2-7B-chat, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Hyperparameter scans can be found in Appendix D, and we select the hyperparameters that result in the optimal MAUVE score (Meister et al., 2023b) to guarantee fairness. We utilized its official generation configuration with the temperature set to 0.6. As illustrated in Table 5, our decoding algorithm demonstrates the ability to enhance generation quality. p is set to 0.95 while threshold for adaptive decoding is set to 0.0005.