Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards
Authors: Rahul Aralikatte, Mostafa Abdou, Heather C Lent, Daniel Hershcovich, Anders Søgaard12516-12525
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments across six encoders of different complexities, six different coreference resolution datasets, and four different SRL datasets ( ), showing improvements across all encoders for coreference resolution, and on 4/6 for SRL, for single-task setups; and similar improvements in multi-task setups, where encoder parameters are shared across the two tasks ( ). |
| Researcher Affiliation | Academia | Rahul Aralikatte, Mostafa Abdou, Heather C Lent, Daniel Hershcovich and Anders Søgaard University of Copenhagen {rahul, abdou, hcl, dh, soegaard}@di.ku.dk |
| Pseudocode | Yes | Algorithm 1 Training Coherence Classifiers |
| Open Source Code | Yes | Our code will be made publicly available at https://github.com/rahular/joint-coref-srl |
| Open Datasets | Yes | For supervised training, we use data from the Co NLL-2012 shared task (Pradhan et al. 2012), which contains data from Onto Notes 5.02 with annotations for both coreference resolution and semantic role labeling. |
| Dataset Splits | Yes | We reduce the learning rates by a factor of 2 if the evaluation on the development sets does not improve after every other epoch. The training is stopped either after 100 epochs, or when the minimum learning rate of 10 7 is reached. |
| Hardware Specification | Yes | All experiments are run on a single GPU with 16GB memory. |
| Software Dependencies | No | The paper mentions models and optimizers like 'Glo Ve', 'BERT', and 'Adam optimizer', but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma and Ba 2015) with a weight decay of 0.01 and initial learning rate of 10 3. For BERT parameters, the learning rate is lowered to 10 5. We reduce the learning rates by a factor of 2 if the evaluation on the development sets does not improve after every other epoch. The training is stopped either after 100 epochs, or when the minimum learning rate of 10 7 is reached. In the multi-task setup, we sample a batch from each task with a frequency proportional to the dataset size of that task. All experiments are run on a single GPU with 16GB memory. The hyperparameters were manually selected to accommodate for training time and resource limitation, and were not tuned based on model evaluation. ... The supervised models are fine-tuned for 10 epochs with the same optimizer configuration. Only the learning rate is changed to 3 10 4. |