IterDE: An Iterative Knowledge Distillation Framework for Knowledge Graph Embeddings
Authors: Jiajun Liu, Peng Wang, Ziyu Shang, Chenxiao Wu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that Iter DE achieves a new state-of-the-art distillation performance for KGEs compared to strong baselines on the link prediction task. Significantly, Iter DE can reduce the training time by 50% on average. Finally, more exploratory experiments show that the soft-label weighting dynamic adjustment mechanism and more fine-grained iterations can improve distillation performance. |
| Researcher Affiliation | Academia | Jiajun Liu, Peng Wang*, Ziyu Shang, Chenxiao Wu School of Computer Science and Engineering, Southeast University {jiajliu, pwang, ziyus1999, chenxiaowu}@seu.edu.cn |
| Pseudocode | No | The paper includes a framework overview diagram (Figure 2) but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of Iter DE and the datasets can be accessed via https://github.com/seukgcode/Iter DE. |
| Open Datasets | Yes | Datasets We use the two popular and open datasets in KGEs: FB15K-237 (Dettmers et al. 2018) and WN18RR (Toutanova et al. 2015). ... The detailed statistical information is shown in Table 1. |
| Dataset Splits | Yes | The detailed statistical information is shown in Table 1. Datasets Ne Nr NTrain NValid NTest FB15K-237 14,541 237 272,115 17,535 20,466 WN18RR 40,943 11 86,835 3,034 3,134 |
| Hardware Specification | Yes | All experiments are implemented on GPU Ge Force RTX 2080 Ti. |
| Software Dependencies | Yes | The experiments are extended from Open KE (Han et al. 2018), an open source library based on Py Torch (Paszke et al. 2019), with CUDA version 10.2.89. |
| Experiment Setup | Yes | In all experiments, we set the teacher model dimension to 512 and the student model dimension to 32. In distillation, we set the compression ratio of each layer α to 2 and the number of iterations N to 4. We set the value of hyperparameter p to 2, while 5 and 10 have the similar results. We set the batch size to 1024 and the epoch for each iteration to a maximum of 1000. We use Adagrad as the optimizer, and the learning rate is chosen among [0.5, 0.1, 0.01]. The initial soft label weight λ0 is chosen in the range [1, 0.1, 0.01]. |