A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization
Authors: Puyuan Liu, Xiang Zhang, Lili Mou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our NACC model on the Gigaword headline generation [10] and DUC2004 [27] datasets in two settings: supervised and unsupervised. Experiments show that NACC establishes the state-of-the-art performance of non-autoregressive summarization under various target lengths in both settings; NACC even outperforms autoregressive Transformers [37] in the unsupervised setting, where the input and output have stronger correspondence. |
| Researcher Affiliation | Academia | Puyuan Liu, Xiang Zhang, Lili Mou Dept. Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta, Canada Canada CIFAR AI Chair, Amii {puyuan, xzhang23}@ualberta.ca, doublepower.mou@gmail.com |
| Pseudocode | No | The paper describes the proposed dynamic programming algorithm in detail using text and equations (e.g., Section 3.2, Figure 1b) but does not present it in a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | 1Our code, model, and output are released at: https://github.com/MANGA-UOFA/NACC |
| Open Datasets | Yes | Our model is evaluated on the Gigaword headline generation [30] and DUC2004 datasets [27]. |
| Dataset Splits | Yes | In total, the dataset contains 3.8M, 198K, and 1951 samples for training, validation, and test, respectively. |
| Hardware Specification | Yes | All experiments were run on an i9-9940X CPU and an RTX6000 GPU. |
| Software Dependencies | No | The paper mentions using a 'Transformer encoder' and references its original paper ('Attention is all you need' [37]), but it does not specify any ancillary software names with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed for replication. |
| Experiment Setup | Yes | We use a Transformer encoder as the base model, which has 6 layers and 8 attention heads for each layer, following the settings in [37]. The dimensions are 512 and 2048 for the attention and feed-forward modules, respectively. Each training batch contains samples amounting to 4K tokens. The learning rate is chosen from {1e-4, 5e-4} by validation, and we ran 100K gradient updates for the unsupervised setting, but 400K updates for the supervised setting. For our length-control algorithm, we adopt a bucket size of 4, and only consider the most probable 20 words for every generation slot (cf. ws in Eqn. 6) due to efficiency concerns. |