Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Contrastive Framework for Neural Text Generation
Authors: Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and analyses on three benchmarks from two languages demonstrate that our proposed approach significantly outperforms current state-of-the-art text generation methods as evaluated by both human and automatic metrics. |
| Researcher Affiliation | Collaboration | Language Technology Lab, University of Cambridge Tencent AI Lab Deep Mind Department of Computer Science, The University of Hong Kong |
| Pseudocode | No | The paper describes the algorithms in text but does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and models are publicly available at https://github.com/yxuansu/Sim CTG. |
| Open Datasets | Yes | We conduct experiments on the Wikitext-103 dataset [16] |
| Dataset Splits | Yes | The hyperparameters of different methods are selected based on their optimal MAUVE (detailed in 4.1.2) performance on the validation set. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA Tesla A100 GPU. |
| Software Dependencies | No | The paper mentions 'Huggingface Library [28]' but does not specify version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | For our Sim CTG and the MLE baseline, we fine-tune the models on Wikitext-103 for 40k training steps. The batch size is set as 128 and the training samples are truncated to a maximum length of 256. We optimize the model with Adam optimizer [12] and a learning rate of 2e-5. |