Commonsense Knowledge Base Completion with Structural and Semantic Context
Authors: Chaitanya Malaviya, Chandra Bhagavatula, Antoine Bosselut, Yejin Choi2925-2933
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on Concept Net. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for Concept Net) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.1 |
| Researcher Affiliation | Collaboration | Allen Institute for Artificial Intelligence University of Washington {chaitanyam, chandrab}@allenai.org, {antoineb, yejin}@cs.washington.edu |
| Pseudocode | No | The paper describes the model architecture and equations but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code and dataset are available at github.com/allenai/ commonsense-kg-completion. |
| Open Datasets | Yes | We focus our experiments on two prominent knowledge graphs: Concept Net and ATOMIC. Statistics for both graphs are provided in Table 1, along with FB15K-237 a standard KB completion dataset. ... 2https://ttic.uchicago.edu/ kgimpel/commonsense.html 3https://homes.cs.washington.edu/ msap/atomic/ |
| Dataset Splits | Yes | We used the original splits from the dataset, and combined the two provided development sets to create a larger development set. The development and test sets consisted of 1200 tuples each. ... The original dataset split was created to make the set of seed entities between the training and evaluation splits mutually exclusive. Since the KB completion task requires entities to be seen at least once, we create a new random 80-10-10 split for the dataset. The development and test sets consisted of 87K tuples each. |
| Hardware Specification | Yes | For instance, the model with GCN and BERT representations for ATOMIC occupies 30GB memory and takes 8-10 days for training on a Quadro RTX 8000 GPU. |
| Software Dependencies | No | The paper mentions 'Deep Graph Library (DGL)' for implementation and finetuning 'BERT', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | BERT Fine-tuning We used a maximum sequence length of 64, batch size of 32, and learning rate of 3e-5 to fine-tune the uncased BERT-Large model with the masked language modeling objective. The warmup proportion was set to 0.1. ... The graph convolutional network used 2 layers... an input and output embedding dimension of 200. The graph batch size used for subgraph sampling was 30000 edges. For the Conv Trans E decoder, we used 500 channels, a kernel size of 5 and a batch size of 128. Dropout was enforced at the feature map layers, the input layer and after the fully connected layer in the decoder, with a value of 0.2. The Adam optimizer was used for optimization with a learning rate of 1e-4 and gradient clipping was performed with a max gradient norm value of 1.0. We performed L2 weight regularization with a weight of 0.1. We also used label smoothing with a value of 0.1. |