SegFormer: A Topic Segmentation Model with Controllable Range of Attention

Authors: Haitao Bai, Pinghui Wang, Ruofei Zhang, Zhou Su

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Seg Former s generalization ability, multilingual ability, and application ability on multiple challenging real-world datasets. Experiments show that our model significantly improves the performance by 7.5% on the benchmark WIKISECTION compared to several strong baselines. Experiments To comprehensively evaluate the effectiveness of our model, we conduct multiple sets of evaluation experiments.
Researcher Affiliation Academia Haitao Bai, Pinghui Wang*, Ruofei Zhang, Zhou Su MOE Key Laboratory for Intelligent Networks and Network Security, Xi an Jiaotong University haitao.bai@stu.xjtu.edu.cn, phwang@mail.xjtu.edu.cn, rfzhang@gmail.com, zhousu@ieee.org
Pseudocode No The paper describes algorithms in paragraph form (e.g., 'Inference Strategy', 'Training Strategy') and illustrates with a diagram (Figure 4), but does not provide structured pseudocode or a formally labeled algorithm block.
Open Source Code Yes We make our source code and datasets publicly available to facilitate future study1. 1https://github.com/nlgandnlu/Seg Former
Open Datasets Yes WIKI-SECTION (Arnold et al. 2019) is generated from the Wikipedia dumps and is a large-scale multi-domain and multilingual dataset. It covers two domains (cities and diseases) and two languages (English and German). Advertisements2 is a Chinese advertising dataset. We use 3313 advertorials that label each sentence as an advertising sentence or not in the dataset. The numbers of documents for training, validation, and testing are 2319, 331, and 663, respectively. 2https://github.com/zhanzecheng/SOHU competition
Dataset Splits Yes The numbers of documents for training, validation, and testing are 2319, 331, and 663, respectively.
Hardware Specification No No specific hardware details (like GPU/CPU models or processor types) are mentioned for the experimental setup.
Software Dependencies No The paper mentions using 'Bert-base' and 'Adam optimizer' but does not specify their version numbers or the versions of underlying software frameworks like PyTorch or TensorFlow.
Experiment Setup Yes We use the pre-trained model Bert-base for English datasets and German Bert for German datasets. The dimension of token embedding is 768, and the size of the dictionary is 30,522. The sentence contextualization encoder has 2 layers with 12 self-attention heads. We have used the Adam optimizer with the learning rate being 0.00001 for BERT and 0.0001 for sentence contextualization encoder and context aggregator. The dropout rate is 0.1. The tunable scalar α is 1. The batch size is 32 and we train our model for 20 epochs. The mask epoch = [0, 2, 6, 10, 20].