Incorporating Constituent Syntax for Coreference Resolution
Authors: Fan Jiang, Trevor Cohn10831-10839
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the English and Chinese portions of Onto Notes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance. |
| Researcher Affiliation | Academia | School of Computing and Information Systems The University of Melbourne, Victoria, Australia |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/Fantabulous-J/Coref-Constituent-Graph. |
| Open Datasets | Yes | Our model is evaluated on the English and Chinese portions of Onto Notes 5.0 dataset (Pradhan et al. 2012). |
| Dataset Splits | Yes | The English corpus consists of 2802, 343 and 348 documents in the training, development and test splits, respectively, while the Chinese corpus contains 1810, 252 and 218 documents for train/dev/test splits. |
| Hardware Specification | No | The paper mentions 'LIEF HPCGPGPU Facility hosted at the University of Melbourne' but does not specify exact GPU/CPU models or other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions software like 'PyTorch', 'Deep Graph Library', 'Span BERT', 'BERT-wwm-base', and 'RoBERTa-wwm-ext-large' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The learning rates of finetuning base and large model are 2 10 5 and 1 10 5. The learning rates of task-specific parameters are 3 10 4 and 5 10 4 for English, and 5 10 4 for Chinese when using base and large model, respectively. Both BERT and task parameters are trained using Adam optimizer (Kingma and Ba 2015), with a warmup learning scheduler for the first 10% of training steps and linear decay scheduler decreasing to 0, respectively. The number of heads is set to 4 and 8 for base and large models. The size of constituent type embeddings is 300. We set the number of graph attention layers as 2. |