Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards

Authors: Rahul Aralikatte, Mostafa Abdou, Heather C Lent, Daniel Hershcovich, Anders Søgaard12516-12525

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments across six encoders of different complexities, six different coreference resolution datasets, and four different SRL datasets ( ), showing improvements across all encoders for coreference resolution, and on 4/6 for SRL, for single-task setups; and similar improvements in multi-task setups, where encoder parameters are shared across the two tasks ( ).
Researcher Affiliation Academia Rahul Aralikatte, Mostafa Abdou, Heather C Lent, Daniel Hershcovich and Anders Søgaard University of Copenhagen {rahul, abdou, hcl, dh, soegaard}@di.ku.dk
Pseudocode Yes Algorithm 1 Training Coherence Classifiers
Open Source Code Yes Our code will be made publicly available at https://github.com/rahular/joint-coref-srl
Open Datasets Yes For supervised training, we use data from the Co NLL-2012 shared task (Pradhan et al. 2012), which contains data from Onto Notes 5.02 with annotations for both coreference resolution and semantic role labeling.
Dataset Splits Yes We reduce the learning rates by a factor of 2 if the evaluation on the development sets does not improve after every other epoch. The training is stopped either after 100 epochs, or when the minimum learning rate of 10 7 is reached.
Hardware Specification Yes All experiments are run on a single GPU with 16GB memory.
Software Dependencies No The paper mentions models and optimizers like 'Glo Ve', 'BERT', and 'Adam optimizer', but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation.
Experiment Setup Yes We use the Adam optimizer (Kingma and Ba 2015) with a weight decay of 0.01 and initial learning rate of 10 3. For BERT parameters, the learning rate is lowered to 10 5. We reduce the learning rates by a factor of 2 if the evaluation on the development sets does not improve after every other epoch. The training is stopped either after 100 epochs, or when the minimum learning rate of 10 7 is reached. In the multi-task setup, we sample a batch from each task with a frequency proportional to the dataset size of that task. All experiments are run on a single GPU with 16GB memory. The hyperparameters were manually selected to accommodate for training time and resource limitation, and were not tuned based on model evaluation. ... The supervised models are fine-tuned for 10 epochs with the same optimizer configuration. Only the learning rate is changed to 3 10 4.