Branch-GAN: Improving Text Generation with (not so) Large Language Models

Authors: Fredrik Carlsson, Johan Broberg, Erik Hillbom, Magnus Sahlgren, Joakim Nivre

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. We begin with a large-scale human evaluation of text quality (Section 4.2), explore automatic metrics that correlate with human quality judgments (Section 4.3) and use these metrics in an automatic model evaluation (Section 4.4). We conclude with a robustness evaluation (Section 4.5) and an ablation study focusing on the hyperparameters depth and sparsity (Section 4.6).
Researcher Affiliation Academia Fredrik Carlsson Johan Broberg Erik Hillbom Magnus Sahlgren Joakim Nivre RISE Research Institutes of Sweden AI Sweden Correspondence: fredrik.carlsson@ri.se
Pseudocode No The paper describes the generator and discriminator mechanisms in text and with diagrams, but it does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Implementation at: Github.com/Fredde Frallan/Branch-GAN. The code used for training and evaluation, along with the human annotations underlying the evaluation in Section 4.2, are available at Github.com/Fredde Frallan/Branch-GAN.
Open Datasets Yes We use checkpoints from the Pythia model suite (Biderman et al., 2023), which were trained on the full Pile dataset (Gao et al., 2020)... Our training data consists of 100k randomly selected sequences of length 128, tokenized using the Pythia tokenizer... All datasets used for further training and evaluation are likewise publicly available.
Dataset Splits No The paper specifies training data and evaluation data, but it does not explicitly define a separate validation set split or its size for hyperparameter tuning during training.
Hardware Specification Yes The hardware for this training was a single DGX machine with 8 NVIDIA A100-SXM4-40GB GPUs.
Software Dependencies No The paper states "straight-forward Python Py Torch" but does not specify version numbers for Python, PyTorch, or any other libraries or dependencies used.
Experiment Setup Yes We train using a branch sequence depth of d=16 and K=32 branches per sample. This is performed with a batch size of 8, for 4 epochs over the training data, resulting in 50k optimizer updates, where 4 5 of the tokens come from generated sequences. The discriminator loss weight is set to α = 0.2. Table 5: Hyperparameters for Branch-GAN