CoCon: A Self-Supervised Approach for Controlled Text Generation

Authors: Alvin Chan, Yew-Soon Ong, Bill Pung, Aston Zhang, Jie Fu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, we show that Co Con can naturally incorporate target content into generated texts and control high-level text attributes in a zero-shot manner. We conduct a range of experiments on Co Con to study its control over generated texts and the quality of these texts.
Researcher Affiliation Collaboration Alvin Chan1 , Yew-Soon Ong1, Bill Pung1, Aston Zhang2, Jie Fu3 1Nanyang Technological University, 2Amazon AI, 3Mila, Polytechnique Montreal
Pseudocode No The paper describes the Co Con architecture and training process in text and diagrams (Figure 1, Figure 2) but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Codes and models are available at: https://github.com/alvinchangw/COCON ICLR2021
Open Datasets Yes We train Co Con for 2 epochs on publicly available GPT-2 medium output texts (250K train samples) that are generated with top-40 k-sampling 3. The training samples (x) are 30-BPE long segments sampled from these GPT-2 output texts. 3Samples from: https://github.com/openai/gpt-2-output-dataset
Dataset Splits Yes The training samples (x) are 30-BPE long segments sampled from these GPT-2 output texts. Subsequently, the xa and xb segments are split from x at a breakpoint between the 8th to 12th BPE position, uniformly sampled during training. The content input (c) and prompt text (p) are randomly sourced from different GPT-2 output samples that are withheld from Co Con training.
Hardware Specification Yes it takes less than 24 hours to train Co Con on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions software like 'GPT-2' and 'Huggingface versions' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or the Huggingface library).
Experiment Setup Yes In all our experiments, the GPT-2 medium 345M model (Radford et al., 2019) is used as the pretrained LM for Co Con. The Co Con s LMα comprises the first 7 GPT-2 Transformer blocks while the remaining 17 blocks make up LMβ in our experiments. The Co Con block s architecture mirrors a single GPT-2 Transformer block with a dimension size of 1024. We train Co Con for 2 epochs on publicly available GPT-2 medium output texts... The training samples (x) are 30-BPE long segments... the xa and xb segments are split from x at a breakpoint between the 8th to 12th BPE position, uniformly sampled during training. The discriminator (fdisc) consists of a 1-D convolutional layer, followed by a linear layer with 2 class outputs and is trained once for every 5 Co Con training steps. To simplify hyperparameter tuning, we set λ = 1 for all four Co Con loss terms and τcontent = 0 for our results. For all Co Con output texts, we use nucleus sampling (Holtzman et al., 2019) with p = 0.9 to draw the next token from the vocabulary s softmax distribution.