Branch-GAN: Improving Text Generation with (not so) Large Language Models
Authors: Fredrik Carlsson, Johan Broberg, Erik Hillbom, Magnus Sahlgren, Joakim Nivre
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. We begin with a large-scale human evaluation of text quality (Section 4.2), explore automatic metrics that correlate with human quality judgments (Section 4.3) and use these metrics in an automatic model evaluation (Section 4.4). We conclude with a robustness evaluation (Section 4.5) and an ablation study focusing on the hyperparameters depth and sparsity (Section 4.6). |
| Researcher Affiliation | Academia | Fredrik Carlsson Johan Broberg Erik Hillbom Magnus Sahlgren Joakim Nivre RISE Research Institutes of Sweden AI Sweden Correspondence: fredrik.carlsson@ri.se |
| Pseudocode | No | The paper describes the generator and discriminator mechanisms in text and with diagrams, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Implementation at: Github.com/Fredde Frallan/Branch-GAN. The code used for training and evaluation, along with the human annotations underlying the evaluation in Section 4.2, are available at Github.com/Fredde Frallan/Branch-GAN. |
| Open Datasets | Yes | We use checkpoints from the Pythia model suite (Biderman et al., 2023), which were trained on the full Pile dataset (Gao et al., 2020)... Our training data consists of 100k randomly selected sequences of length 128, tokenized using the Pythia tokenizer... All datasets used for further training and evaluation are likewise publicly available. |
| Dataset Splits | No | The paper specifies training data and evaluation data, but it does not explicitly define a separate validation set split or its size for hyperparameter tuning during training. |
| Hardware Specification | Yes | The hardware for this training was a single DGX machine with 8 NVIDIA A100-SXM4-40GB GPUs. |
| Software Dependencies | No | The paper states "straight-forward Python Py Torch" but does not specify version numbers for Python, PyTorch, or any other libraries or dependencies used. |
| Experiment Setup | Yes | We train using a branch sequence depth of d=16 and K=32 branches per sample. This is performed with a batch size of 8, for 4 epochs over the training data, resulting in 50k optimizer updates, where 4 5 of the tokens come from generated sequences. The discriminator loss weight is set to α = 0.2. Table 5: Hyperparameters for Branch-GAN |