Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation
Authors: Pei Ke, Haozhe Ji, Zhenyu Yang, Yi Huang, Junlan Feng, Xiaoyan Zhu, Minlie Huang
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation. |
| Researcher Affiliation | Collaboration | 1Co AI Group, DCST, IAI, BNRIST, Tsinghua University, Beijing, China 2OPPO Mobile Telecommunications Corp., Ltd, China 3JIUTIAN Team, China Mobile Research Institute, Beijing 100053, China 4Tsinghua University-China Mobile Communications Group Co., Ltd. Joint Institute, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Curriculum-Based Self-Training (CBST) |
| Open Source Code | Yes | The codes are available at https://github.com/kepei1106/CBST. |
| Open Datasets | Yes | Web NLG. This dataset aims to generate textual descriptions for RDF triples [Shimorina and Gardent, 2018]. ... Wiki Bio. This dataset aims to generate the first sentence of biography descriptions for Wikipedia tables [Lebret et al., 2016]. ... We further constructed the unlabeled dataset for each benchmark dataset based on Gen Wiki [Jin et al., 2020]. |
| Dataset Splits | Yes | The number of instances in training / validation / test set is 34,352 / 4,316 / 4,224, respectively. We followed the existing works [Chen et al., 2020a] to pre-process this dataset and use 0.5%, 1%, 5%, 10% of the training instances as the labeled datasets in the few-shot setting. |
| Hardware Specification | No | The base version of BART was adopted because of the limited computational resources. (This statement is too vague to be considered a specific hardware detail.) |
| Software Dependencies | No | As for the model structure, we used BART [Lewis et al., 2020] as the text-to-text pre-trained model in our experiments. The base version of BART was adopted because of the limited computational resources. We followed BART to use Byte-Pair Encoding vocabulary with the size of 50,265. (The paper mentions BART and Byte-Pair Encoding but does not provide specific version numbers for any software dependencies.) |
| Experiment Setup | Yes | In our self-training algorithm, we set the number of curriculum MC to be 3. ... For the hyper-parameters to select pseudo-labeled data, we set ϵcov = 1.0, ϵgen = 50%. The probabilities of word substitution and triple reordering were set to pword = ptriple = 0.4. ... The training epoch at each iteration was set to be 20. The learning rate was 0.00003. The batch size was 32 / 24 for Web NLG / Wiki Bio, respectively. The maximum length of linearized structured data was 256 / 384 for Web NLG / Wiki Bio, respectively, while the length of text sequences was 128. |