A Branching Decoder for Set Generation

Authors: Zixian Huang, Gengyang Xiao, Yu Gu, Gong Cheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several keyphrase generation datasets demonstrate that the branching decoder is more effective and efficient than the existing sequential decoder. In a thorough evaluation spanning three representative keyphrase generation benchmarks, One2Branch unequivocally outpaces the established One2Seq techniques, yielding both augmented performance and heightened efficiency.
Researcher Affiliation Academia Zixian Huang, Gengyang Xiao State Key Laboratory for Novel Software Technology Nanjing University, Nanjing, China {zixianhuang, gyxiao}@smail.nju.edu.cn Yu Gu The Ohio State University Columbus, USA gu.826@osu.edu Gong Cheng State Key Laboratory for Novel Software Technology Nanjing University, Nanjing, China gcheng@nju.edu.cn
Pseudocode Yes Algorithm 1 Decoding Input: X, stepmax, kmin, kmax
Open Source Code Yes Code: https://github.com/nju-websoft/One2Branch
Open Datasets Yes Keyphrase generation (KG) is a classic task of set generation with rich experimental data. We selected three large-scale KG datasets as our main experimental data: KP20k (Meng et al., 2017), KPTimes (Gallina et al., 2019), and Stack Ex (Yuan et al., 2020), which are from the fields of science, news and online forums, respectively.
Dataset Splits Yes Dataset statistics are shown in Table 1. # KP, |KP|, and % Abs KP refer to the average number of keyphrases per document, the average number of words that each keyphrase contains, and the percentage of absent keyphrases, respectively. All of them are calculated over the dev set. Table 1: Dataset Field # Train # Dev # Test # KP |KP| % Abs KP KP20k Science 509 K 20 K 20 K 5.3 2.1 39.8 KPTimes News 259 K 10 K 20 K 5.0 2.2 56.4 Stack Ex Forum 298 K 16 K 16 K 2.7 1.3 46.5
Hardware Specification Yes We used gradient accumulation 64, trained One2Branch based on T5-Base (223 M) on a single RTX 4090 (24 G), and trained One2Branch based on T5-Large (738 M) on two RTX 4090. For inference, we ran both base and large versions on a single RTX 4090.
Software Dependencies Yes We implemented our One2Branch scheme based on the code of huggingface transformers 4.12.5 1 and used T5 Raffel et al. (2020) as backbone. We also implemented our One2Branch scheme based on Mind Spore 2.0.
Experiment Setup Yes For the two stages of training, we trained 15 epochs in the first stage, and 5 epochs in the second stage. We set stepmax = 20 to ensure that stepmax is greater than the length of all keyphrases on all dev sets. We set kmax = 60 to ensure that kmax is greater than all numbers of keyphrases on all dev set. We tuned kmin on each dev set from 1 to 15 to search for the largest sum of all metrics, and the best kmin was used on the test set. On all three datasets, the best performance was achieved with kmin = 8. We followed the setting of Wu et al. (2022) using batch size 64, learning rate 1e-4, maximum sequence length 512, and Adam W optimizer. We used three seeds {0, 1, 2} and took the mean results. We used gradient accumulation 64...