Copy is All You Need

Authors: Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to verify the effectiveness of our proposed COG. On the standard language modeling benchmark (Wiki Text-103), our proposed COG substantially outperforms standard baselines on automatic metrics (26.14 vs. 23.43 MAUVE (Pillutla et al., 2021)) and human evaluation (48% vs. 28% human preference). (from Introduction) and 4 EXPERIMENTAL SETUP
Researcher Affiliation Collaboration Tencent AI Lab School of Computer Science and Technology, Beijing Institute of Technology
Pseudocode Yes Algorithm 1: Phrase Segmentation Algorithm
Open Source Code Yes Our source codes are publicly available at https://github.com/gmftby GMFTBY/Copyisallyouneed.
Open Datasets Yes On the standard language modeling benchmark (Wiki Text-103) ... The Wiki Text-103 dataset (Merity et al., 2017) contains an extensive collection of Wikipedia articles with over 100 million words... we use the English part of Law-MT (Koehn & Knowles, 2017)... The En-Wiki corpus contains a large-scale collection of Wikipedia articles with over 3 billion words
Dataset Splits Yes Benchmarks Train Dev Test Wiki Text-103 1,801,350 3,760 4,358 Law-MT 389,292 2,000 2,000
Hardware Specification Yes We train baselines and COG for 400,000 steps on 8 Tesla-V100 GPUs.
Software Dependencies No The paper mentions 'Huggingface transformers package' and specific models like 'GPT2 model' and 'BERT-base-cased model', but does not provide specific version numbers for these software dependencies or other libraries.
Experiment Setup Yes For all the baselines, the learning rate, dropout rate, and gradient clipping are set as 5e-5, 0.1, and 1.0, respectively. Due to memory limitation, the batch size is set to contain 256 phrases. For the BERT model in the phrase encoder, the maximum sequence length is set as 256. For the GPT2 model in the prefix encoder, the maximum sequence length is set as 512.