Reasoning on Knowledge Graphs with Debate Dynamics
Authors: Marcel Hildebrandt, Jorge Andres Quintero Serna, Yunpu Ma, Martin Ringsquandl, Mitchell Joblin, Volker Tresp4123-4131
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. |
| Researcher Affiliation | Collaboration | 1Siemens Corporate Technology, 2Ludwig Maximilian University |
| Pseudocode | Yes | Algorithm 1 contains a pseudocode of R2D2 at inference time. |
| Open Source Code | Yes | The datasets along with the code of R2D2 are available at https://github.com/m-hildebrandt/R2D2. |
| Open Datasets | Yes | We measure the performance of R2D2 with respect to the triple classification and the KG completion task on the benchmark datasets FB15k-237 (Toutanova et al. 2015) and WN18RR (Dettmers et al. 2018). To test R2D2 on a real world task we also consider Hetionet (Himmelstein and Baranzini 2015)... |
| Dataset Splits | Yes | Thereby the canonical splits of the datasets into a training, validation, and test set are used. In particular, we ensured that triples that are assigned to the validation or test set (and their respective inverse relations) are not included in the KG during training. The results on the test set of all methods are reported based on the hyperparameters that showed the best performance (based on the highest accuracy for triple classification and the the highest MRR for link prediction) on the validation set. |
| Hardware Specification | Yes | All experiments were conducted on a machine with 48 CPU cores and 96 GB RAM. |
| Software Dependencies | No | The paper mentions algorithms like LSTMs and Adam, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | We considered the following hyperparameter ranges for R2D2: The number of latent dimensions d for the embeddings is chosen from the range {32, 64, 128}. The number of LSTM layers for the agents is chosen from {1, 2, 3}. The the number of layers in the MLP for the judge is tuned in the range {1, 2, 3, 4, 5}. β was chosen from {0.02, 0.05, 0.1}. The length of each argument T was tuned in the range {1, 2, 3} and the number of debate rounds N was set to 3. Moreover, the L2-regularization strength λ is set to 0.02. Furthermore, the number of rollouts during training is given by 20 and 50 (triple classification) or 100 (KG completion) at test time. The loss of the judge and the rewards of the agents were optimized using Adam with learning rate given 10 4. The best hyperparameter are reported in Table 3. |