Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Authors: Goran Glavašš, Swapna Somasundaran7797-7804

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show (1) that even without the auxiliary coherence objective, the Two-Level Transformer model for Text Segmentation (TLT-TS) yields state-of-the-art performance across multiple benchmarks, (2) that the full CATS model, with the auxiliary coherence modeling, further significantly improves the segmentation, and (3) that both TLT-TS and CATS are robust in domain transfer. Furthermore, we demonstrate models effectiveness in zero-shot language transfer.
Researcher Affiliation Collaboration Goran Glavaˇs,1 Swapna Somasundaran2 1Data and Web Science Research Group University of Mannheim goran@informatik.uni-mannheim.de 2Educational Testing Service (ETS) ssomasundaran@ets.org
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code, nor does it state that the code will be released.
Open Datasets Yes Koshorek et al. (2018) leveraged the manual structuring of Wikipedia pages into sections to automatically create a large segmentation-annotated corpus. WIKI-727K consists of 727,746 documents created from English (EN) Wikipedia pages, divided into training (80%), development (10%), and test portions (10%). We train, optimize, and evaluate our models on respective portions of the WIKI-727K dataset.
Dataset Splits Yes WIKI-727K consists of 727,746 documents created from English (EN) Wikipedia pages, divided into training (80%), development (10%), and test portions (10%).
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions 'FASTTEXT word embeddings' and 'Adam optimization algorithm' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We found the following configuration to lead to robust performance for both TLT-TS and CATS: (1) training instance preparation: snippet size of K = 16 sentences with T = 50 tokens; scrambling probabilities p1 = p2 = 0.5; (2) configuration of Transformers: NT T = NT S = 6 layers and with 4 attention heads per layer in both transformers; (3) other model hyperparameters: positional embedding size of dp = 10; coherence objective contrastive margin of δcoh = 1. We found different optimal inference thresholds: τ = 0.5 for the segmentation-only TLTTS model and τ = 0.3 for the coherence-aware CATS model. We trained both TLT-TS and CATS in batches of N = 32 snippets (each with K = 16 sentences), using the Adam optimization algorithm (Kingma and Ba 2014) with the initial learning rate set to 10 4.