Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

Authors: Hongyu Ren*, Weihua Hu*, Jure Leskovec

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of QUERY2BOX on three large KGs and show that QUERY2BOX achieves up to 25% relative improvement over the state of the art.
Researcher Affiliation Academia Hongyu Ren , Weihua Hu , Jure Leskovec Department of Computer Science, Stanford University {hyren,weihuahu,jure}@cs.stanford.edu
Pseudocode No The paper describes the logical operations and their mathematical formulations but does not include a distinct pseudocode block or a section explicitly labeled "Algorithm".
Open Source Code Yes Project website with data and code: http://snap.stanford.edu/ query2box
Open Datasets Yes We perform experiments on three standard KG benchmarks, FB15k (Bordes et al., 2013), FB15k-237 (Toutanova & Chen, 2015), and NELL995 (Xiong et al., 2017)
Dataset Splits Yes Given the standard split of edges into training, test, and validation sets, we first augment the KG to also include inverse relations and effectively double the number of edges in the graph. We then create three graphs: Gtrain, which only contains training edges and we use this graph to train node embeddings as well as box operators. We then also generate two bigger graphs: Gvalid, which contains Gtrain plus the validation edges, and Gtest, which includes Gvalid as well as the test edges.
Hardware Specification No The paper does not specify any particular hardware components such as GPU models, CPU types, or specific cloud instance details used for the experiments.
Software Dependencies No The paper mentions using "Adam Optimizer (Kingma & Ba, 2015)", but it does not provide specific version numbers for other key software components, libraries, or programming languages used in the implementation.
Experiment Setup Yes We use embedding dimensionality of d = 400 and set γ = 24, α = 0.2 for the loss in Eq. 4. We train all types of training queries jointly. In every iteration, we sample a minibatch size of 512 queries for each query structure (details in Appendix D), and we sample 1 answer entity and 128 negative entities for each query. We optimize the loss in Eq. 4 using Adam Optimizer (Kingma & Ba, 2015) with learning rate = 0.0001. We train all models for 250 epochs, monitor the performance on the validation set, and report the test performance.