Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation

Authors: Pei Ke, Haozhe Ji, Zhenyu Yang, Yi Huang, Junlan Feng, Xiaoyan Zhu, Minlie Huang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
Researcher Affiliation Collaboration 1Co AI Group, DCST, IAI, BNRIST, Tsinghua University, Beijing, China 2OPPO Mobile Telecommunications Corp., Ltd, China 3JIUTIAN Team, China Mobile Research Institute, Beijing 100053, China 4Tsinghua University-China Mobile Communications Group Co., Ltd. Joint Institute, Beijing, China
Pseudocode Yes Algorithm 1 Curriculum-Based Self-Training (CBST)
Open Source Code Yes The codes are available at https://github.com/kepei1106/CBST.
Open Datasets Yes Web NLG. This dataset aims to generate textual descriptions for RDF triples [Shimorina and Gardent, 2018]. ... Wiki Bio. This dataset aims to generate the first sentence of biography descriptions for Wikipedia tables [Lebret et al., 2016]. ... We further constructed the unlabeled dataset for each benchmark dataset based on Gen Wiki [Jin et al., 2020].
Dataset Splits Yes The number of instances in training / validation / test set is 34,352 / 4,316 / 4,224, respectively. We followed the existing works [Chen et al., 2020a] to pre-process this dataset and use 0.5%, 1%, 5%, 10% of the training instances as the labeled datasets in the few-shot setting.
Hardware Specification No The base version of BART was adopted because of the limited computational resources. (This statement is too vague to be considered a specific hardware detail.)
Software Dependencies No As for the model structure, we used BART [Lewis et al., 2020] as the text-to-text pre-trained model in our experiments. The base version of BART was adopted because of the limited computational resources. We followed BART to use Byte-Pair Encoding vocabulary with the size of 50,265. (The paper mentions BART and Byte-Pair Encoding but does not provide specific version numbers for any software dependencies.)
Experiment Setup Yes In our self-training algorithm, we set the number of curriculum MC to be 3. ... For the hyper-parameters to select pseudo-labeled data, we set ϵcov = 1.0, ϵgen = 50%. The probabilities of word substitution and triple reordering were set to pword = ptriple = 0.4. ... The training epoch at each iteration was set to be 20. The learning rate was 0.00003. The batch size was 32 / 24 for Web NLG / Wiki Bio, respectively. The maximum length of linearized structured data was 256 / 384 for Web NLG / Wiki Bio, respectively, while the length of text sequences was 128.