CoCon: A Self-Supervised Approach for Controlled Text Generation
Authors: Alvin Chan, Yew-Soon Ong, Bill Pung, Aston Zhang, Jie Fu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we show that Co Con can naturally incorporate target content into generated texts and control high-level text attributes in a zero-shot manner. We conduct a range of experiments on Co Con to study its control over generated texts and the quality of these texts. |
| Researcher Affiliation | Collaboration | Alvin Chan1 , Yew-Soon Ong1, Bill Pung1, Aston Zhang2, Jie Fu3 1Nanyang Technological University, 2Amazon AI, 3Mila, Polytechnique Montreal |
| Pseudocode | No | The paper describes the Co Con architecture and training process in text and diagrams (Figure 1, Figure 2) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and models are available at: https://github.com/alvinchangw/COCON ICLR2021 |
| Open Datasets | Yes | We train Co Con for 2 epochs on publicly available GPT-2 medium output texts (250K train samples) that are generated with top-40 k-sampling 3. The training samples (x) are 30-BPE long segments sampled from these GPT-2 output texts. 3Samples from: https://github.com/openai/gpt-2-output-dataset |
| Dataset Splits | Yes | The training samples (x) are 30-BPE long segments sampled from these GPT-2 output texts. Subsequently, the xa and xb segments are split from x at a breakpoint between the 8th to 12th BPE position, uniformly sampled during training. The content input (c) and prompt text (p) are randomly sourced from different GPT-2 output samples that are withheld from Co Con training. |
| Hardware Specification | Yes | it takes less than 24 hours to train Co Con on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions software like 'GPT-2' and 'Huggingface versions' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or the Huggingface library). |
| Experiment Setup | Yes | In all our experiments, the GPT-2 medium 345M model (Radford et al., 2019) is used as the pretrained LM for Co Con. The Co Con s LMα comprises the first 7 GPT-2 Transformer blocks while the remaining 17 blocks make up LMβ in our experiments. The Co Con block s architecture mirrors a single GPT-2 Transformer block with a dimension size of 1024. We train Co Con for 2 epochs on publicly available GPT-2 medium output texts... The training samples (x) are 30-BPE long segments... the xa and xb segments are split from x at a breakpoint between the 8th to 12th BPE position, uniformly sampled during training. The discriminator (fdisc) consists of a 1-D convolutional layer, followed by a linear layer with 2 class outputs and is trained once for every 5 Co Con training steps. To simplify hyperparameter tuning, we set λ = 1 for all four Co Con loss terms and τcontent = 0 for our results. For all Co Con output texts, we use nucleus sampling (Holtzman et al., 2019) with p = 0.9 to draw the next token from the vocabulary s softmax distribution. |