reproducibilityindex.ai

Generative Cooperative Networks for Natural Language Generation

Authors: Sylvain Lamprier, Thomas Scialom, Antoine Chaffin, Vincent Claveau, Ewa Kijak, Jacopo Staiano, Benjamin Piwowarski

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments To evaluate the framework, we experiment on standard complementary unconditional and conditional NLG tasks, with the following datasets: Table 1. Final results on QG and Summarization test sets, in terms of BLEU-4 (B), ROUGE-1 (R-1) and ROUGE-L (R-L).
Researcher Affiliation	Academia	1ISIR Sorbonne Universit e, Paris, France 2Reci TAL, Paris, France 3IRISA, Rennes, France 4IMATAG, Rennes, France 5CNRS.
Pseudocode	Yes	Algorithm 1 RML-GAN, Algorithm 2 Generative Cooperative Networks
Open Source Code	No	The paper uses the Hugging Face library for T5 models, but does not provide a link or explicit statement for the availability of their own GCN implementation code.
Open Datasets	Yes	EMNLP2017 News dataset. SQuAD dataset (Rajpurkar et al., 2016) CNN/Daily Mail dataset (CNNDM) (Nallapati et al., 2016)
Dataset Splits	No	The paper mentions 'tuned on a validation set' and 'validation set', but does not provide specific percentages or counts for training, validation, and test splits for the datasets used, nor does it explicitly reference standard splits for these datasets.
Hardware Specification	Yes	Using 4 Nvidia V100 SXM2 GPUs, GCN ˆq=MCT S training took 32 hours for summarization, and 8 hours for QG.
Software Dependencies	No	The paper mentions using T5 models implemented in the Hugging Face library and refers to Transformer T5, but it does not specify version numbers for general software dependencies like Python, PyTorch, or the Hugging Face library itself.
Experiment Setup	Yes	During this pre-training, we used a learning rate fixed to 5e-6 for both the discriminator and the generator, and a number of epochs set to 5. We tested on a validation set different values for our hyper parameter Cpuct {1.0, 2.0, 3.0, 4.0} and found that 3.0 gives the best results. We thus only report the results with Cpuct = 3.0. For the budget allocated to the MCTS we tested different number of simulations per token for the MLE model with n {5, 10, 25, 50, 100} and observed no significant improvement between 50 and 100. We hence used n = 50 for all our experiments. max sequence length (512 in our experiments). ϵ = 0.1 and σ = 0.1 in our experiments.