KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
Authors: Yuxi Feng, Xiaoyuan Yi, Laks V.S. Lakshmanan, Xing Xie
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines. |
| Researcher Affiliation | Collaboration | 1The University of British Columbia, Vancouver, Canada 2Microsoft Research Asia, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Training Process of KEST |
| Open Source Code | Yes | Code and appendices are available at https://github.com/peterfengyx/KEST. |
| Open Datasets | Yes | We evaluate the sentiment controllability on the IMDb movie review dataset [Maas et al., 2011]. We use the AGNews dataset [Zhang et al., 2015] to evaluate topic controllability. We use the Jigsaw Toxicity Dataset for training... |
| Dataset Splits | Yes | For IMDb, we sample 5% of the training samples as labeled data and directly take their provided unlabeled set. Since there is no separate unlabeled text in AGNews, we sample 3% of training samples as labeled data and use the others as unlabeled ones. For a fair comparison, we keep the ratio of labeled/pseudo/unlabeled text to 1:1:30. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or types of processors used for experiments. |
| Software Dependencies | No | The paper mentions software like Uni LM, Adam W, RoBERTa-large, BERT-base, and GPT2-XL, but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | We use Adam W [Loshchilov and Hutter, 2019] with learning rate = 5e-5, warm-up steps = one epoch, and batch size = 8 for optimization. The top-p (p = 0.9) sampling method is used for decoding in evaluation. We set λc = λag = λnag = 1.0 in Eq. (4) across all tasks. |