Tractable Control for Autoregressive Language Generation
Authors: Honghua Zhang, Meihua Dang, Nanyun Peng, Guy Van Den Broeck
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of Ge La To on challenging benchmarks for constrained generation: Common Gen (Lin et al., 2020), Yelp!Review (Cho et al., 2019) and News (Zhang et al., 2020); in particular, we focus on Common Gen for detailed analysis. For both unsupervised and supervised settings, Ge La To achieves state-of-the-art performance in terms of various automatic evaluation metrics including BLEU score while guaranteeing 100% constraint satisfaction. Main evaluation results are presented in Table 1. |
| Researcher Affiliation | Academia | Honghua Zhang * 1 Meihua Dang * 1 Nanyun Peng 1 Guy Van den Broeck 1 1Department of Computer Science, University of California, Los Angeles, USA. |
| Pseudocode | Yes | Algorithm 1 Constrained Sampling with Ge La To |
| Open Source Code | Yes | In this section, we demonstrate the effectiveness of Ge La To2 https://github.com/UCLA-StarAI/GeLaTo on challenging benchmarks for constrained generation |
| Open Datasets | Yes | Common Gen (Lin et al., 2020) is a benchmark for constrained generation with lexical constraints... We also evaluate Ge La To on the Yelp!Review (Cho et al., 2019) and the News (Zhang et al., 2020) datasets. |
| Dataset Splits | Yes | For hyper-parameter tuning, we conduct cross-validation on a small subset of the training set and report evaluation results for both validation (dev) and test set. |
| Hardware Specification | Yes | all methods are evaluated on a single NVIDIA A100 GPU with 40 GB memory |
| Software Dependencies | No | The paper mentions the use of 'Juice.jl framework (Dang et al., 2021)' and 'LemmInflect3', but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Unsupervised Setting: we fine-tune the model for 1 epoch with learning rate = 1e-6. Supervised Setting: for 3 epochs with learning rate = 1e-6. Training HMMs: we train HMMs with the expectation-maximization (EM) algorithm for 40 epochs, and we resample 0.2 million examples for each epoch. Decoding: We adopt beam search to greedily search for x1:n that maximizes p(x1:n | α); we experiment with different beam sizes: 16, 32, 64 and 128. |