Incorporating Constituent Syntax for Coreference Resolution

Authors: Fan Jiang, Trevor Cohn10831-10839

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the English and Chinese portions of Onto Notes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
Researcher Affiliation Academia School of Computing and Information Systems The University of Melbourne, Victoria, Australia
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code is available at https://github.com/Fantabulous-J/Coref-Constituent-Graph.
Open Datasets Yes Our model is evaluated on the English and Chinese portions of Onto Notes 5.0 dataset (Pradhan et al. 2012).
Dataset Splits Yes The English corpus consists of 2802, 343 and 348 documents in the training, development and test splits, respectively, while the Chinese corpus contains 1810, 252 and 218 documents for train/dev/test splits.
Hardware Specification No The paper mentions 'LIEF HPCGPGPU Facility hosted at the University of Melbourne' but does not specify exact GPU/CPU models or other detailed hardware specifications used for experiments.
Software Dependencies No The paper mentions software like 'PyTorch', 'Deep Graph Library', 'Span BERT', 'BERT-wwm-base', and 'RoBERTa-wwm-ext-large' but does not provide specific version numbers for these software components.
Experiment Setup Yes The learning rates of finetuning base and large model are 2 10 5 and 1 10 5. The learning rates of task-specific parameters are 3 10 4 and 5 10 4 for English, and 5 10 4 for Chinese when using base and large model, respectively. Both BERT and task parameters are trained using Adam optimizer (Kingma and Ba 2015), with a warmup learning scheduler for the first 10% of training steps and linear decay scheduler decreasing to 0, respectively. The number of heads is set to 4 and 8 for base and large models. The size of constituent type embeddings is 300. We set the number of graph attention layers as 2.