Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation
Authors: Goran Glavaš, Swapna Somasundaran7797-7804
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show (1) that even without the auxiliary coherence objective, the Two-Level Transformer model for Text Segmentation (TLT-TS) yields state-of-the-art performance across multiple benchmarks, (2) that the full CATS model, with the auxiliary coherence modeling, further significantly improves the segmentation, and (3) that both TLT-TS and CATS are robust in domain transfer. Furthermore, we demonstrate models effectiveness in zero-shot language transfer. |
| Researcher Affiliation | Collaboration | Goran Glavaˇs,1 Swapna Somasundaran2 1Data and Web Science Research Group University of Mannheim goran@informatik.uni-mannheim.de 2Educational Testing Service (ETS) ssomasundaran@ets.org |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code, nor does it state that the code will be released. |
| Open Datasets | Yes | Koshorek et al. (2018) leveraged the manual structuring of Wikipedia pages into sections to automatically create a large segmentation-annotated corpus. WIKI-727K consists of 727,746 documents created from English (EN) Wikipedia pages, divided into training (80%), development (10%), and test portions (10%). We train, optimize, and evaluate our models on respective portions of the WIKI-727K dataset. |
| Dataset Splits | Yes | WIKI-727K consists of 727,746 documents created from English (EN) Wikipedia pages, divided into training (80%), development (10%), and test portions (10%). |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'FASTTEXT word embeddings' and 'Adam optimization algorithm' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We found the following configuration to lead to robust performance for both TLT-TS and CATS: (1) training instance preparation: snippet size of K = 16 sentences with T = 50 tokens; scrambling probabilities p1 = p2 = 0.5; (2) configuration of Transformers: NT T = NT S = 6 layers and with 4 attention heads per layer in both transformers; (3) other model hyperparameters: positional embedding size of dp = 10; coherence objective contrastive margin of δcoh = 1. We found different optimal inference thresholds: τ = 0.5 for the segmentation-only TLTTS model and τ = 0.3 for the coherence-aware CATS model. We trained both TLT-TS and CATS in batches of N = 32 snippets (each with K = 16 sentences), using the Adam optimization algorithm (Kingma and Ba 2014) with the initial learning rate set to 10 4. |