reproducibilityindex.ai

Beyond MLE: Convex Learning for Text Generation

Authors: Chenze Shao, Zhengrui Ma, Min Zhang, Yang Feng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various text generation tasks and models show the effectiveness of our approach.
Researcher Affiliation	Academia	1 Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 School of Future Science and Engineering, Soochow University
Pseudocode	No	The paper includes mathematical formulations and proofs but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is available at https://github.com/ictnlp/Convex-Learning.
Open Datasets	Yes	Datasets We conduct experiments on widely used translation benchmark: WMT14 English-German (EN-DE, 4.5M)... We conduct experiments on two widely used summarization benchmarks: CNN/Daily Mail [18] and Xsum [34].
Dataset Splits	Yes	Datasets We conduct experiments on widely used translation benchmark: WMT14 English-German (EN-DE, 4.5M), where the validation and test sets are newstest2013 and newstest2014 respectively.
Hardware Specification	Yes	The decoding speedup is measured with a batch size of 1 on Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions software like Adam optimizer, BPE, GPT-2 tokenizer, and Berttokenizer, but does not provide specific version numbers for these or other key software components.
Experiment Setup	Yes	Detailed information regarding other training hyperparameters can be found in Table 7. Table 7: Settings of training hyperparameters on WMT14 EN DE dataset. Transformer Vanilla-NAT CMLM CTC MLE Convex MLE Convex MLE Convex MLE Convex batch size 32k 32k 64k 256k 64k 256k 64k 256k learning rate 7e-4 2e-4 5e-4 3e-4 5e-4 3e-4 5e-4 3e-4 warmup steps 4k 1k 10k 500 10k 500 10k 500 training steps 200k 50k 300k 10k 300k 10k 300k 10k dropout 0.1 0.1 0.3 0.3 0.3 0.3 0.3 0.1 weight decay 0 0 0.01 0.01 0.01 0.01 0.01 0.01 label smoothing 0.1 0.1 0.1 0 0.1 0 0.01 0 length loss factor 0.1 0.01 0.1 0.01 -