Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TextGAIL: Generative Adversarial Imitation Learning for Text Generation
Authors: Qingyang Wu, Lei Li, Zhou Yu14067-14075
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that Text GAIL achieves better performance in terms of both quality and diversity than the MLE baseline. |
| Researcher Affiliation | Collaboration | Qingyang Wu 1, Lei Li 2, Zhou Yu 1, 1University of California, Davis, 2Byte Dance AI Lab, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Text GAIL |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For unconditional generation tasks, previous text GANs often only perform experiment on unconditional generation tasks: COCO and EMNLP2017 News. We extend the experiments to conditional generation tasks, as more practical applications. Specifically, we experiment our model on Common GEN and ROCStories. |
| Dataset Splits | No | The paper mentions using a 'part of the training set' for warm-up and describes stopping criteria, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'GPT-2 base' and 'Ro BERTabase' models, but it does not specify any general software dependencies like programming language versions (e.g., Python 3.x) or library versions (e.g., TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | The human demonstrations mix ratio p is set to 0.3 at the start of the training and linearly decay afterward. The constant reward for human demonstrations is set to 2.0. ... We perform beam search with a beam size of four on the two conditional generation tasks. ... We observe the model has less repetition and better quality with nucleus sampling with hyper-parameters top-p 0.9 and temperature 0.8. |