Fine-Grained Argument Unit Recognition and Classification
Authors: Dietrich Trautmann, Johannes Daxenberger, Christian Stab, Hinrich Schütze, Iryna Gurevych9048-9056
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a dataset of arguments from heterogeneous sources annotated as spans of tokens within a sentence, as well as with a corresponding stance. We show that and how such difficult argument annotations can be effectively collected through crowdsourcing with high interannotator agreement. The new benchmark, AURC-8, contains up to 15% more arguments per topic as compared to annotations on the sentence level. We identify a number of methods targeted at AURC sequence labeling, achieving close to human performance on known domains. Further analysis also reveals that, contrary to previous approaches, our methods are more robust against sentence segmentation errors. We publicly release our code and the AURC-8 dataset. |
| Researcher Affiliation | Academia | Center for Information and Language Processing (CIS), LMU Munich, Germany Ubiquitous Knowledge Processing Lab (UKP-TUDA), TU Darmstadt, Germany |
| Pseudocode | No | The paper does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | We publicly release our code and the AURC-8 dataset.1 1https://github.com/trtm/AURC |
| Open Datasets | Yes | We publicly release our code and the AURC-8 dataset.1 1https://github.com/trtm/AURC |
| Dataset Splits | Yes | As a result, there are 4000 samples in train, 800 in dev and 2000 in test for the cross-domain split; and 4200 samples in train, 600 in dev and 1200 in test for the in-domain split. |
| Hardware Specification | No | BERTBASE, which requires only one GPU for training, is a good option if computational resources are limited. (No specific GPU model, processor, or memory details are provided.) |
| Software Dependencies | No | The paper mentions software like FLAIR, BERT, spacy, Elasticsearch, justext, and Argumen Text Classify API, but it does not specify version numbers for these software dependencies, only general usage. |
| Experiment Setup | Yes | This section lists the hyperparameters used for the experimental systems described in the main part of the paper. For FLAIR in the token-level model, we used a learning rate of 1e-1 with gradual decreasing, hiddensize=256 and for the sentence-level model the same setting for the learning rate, but with hiddensize=512. For BERTLARGE and BERTLARGE+CRF, we used the large cased pretrained model with whole word masking and in the token-level setup a learning rate of 1e-5 for in and cross-domain. We kept the learning rate at 4e-5 for the sentence-level BERTLARGE model and at 1e-5 for the BERTLARGE+CRF and used the Adam W optimizer. The max. length of the tokenized BERT input was set to 64 tokens and we always had a dropout rate of 0.1. All experiments were run three times with different seeds, a trainings batch size of 32 and for a max. of 100 epochs, with earlier stopping if the performance/loss did not improve/decreased significantly (after ten epochs). |