Randomized Generation of Adversary-aware Fake Knowledge Graphs to Combat Intellectual Property Theft

Authors: Snow Kang, Cristian Molinaro, Andrea Pugliese, V. S. Subrahmanian4155-4163

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the efficacy of our algorithm on 3 diverse real-world datasets, showing that it achieves high levels of deception. (...) We run experiments on 3 knowledge graph datasets showing that CLIQUE-FAKEKG achieves good results in deceiving adversaries.
Researcher Affiliation Academia Snow Kang,1 Cristian Molinaro,2 Andrea Pugliese,2 V.S. Subrahmanian1 1 Dartmouth College, USA 2 University of Calabria, Italy
Pseudocode Yes Algorithm 1 NAIVECLIQUE-FAKEKG (...) Algorithm 2 CLIQUECOMPUTATION (...) Algorithm 3 CLIQUE-FAKEKG
Open Source Code No For duplication purposes, the code, sample KGs, and sample outputs generated may be downloaded from https://dsaildartmouth.github.io/Fake KG.pdf. (Note: This URL points to the paper's PDF, not source code.)
Open Datasets Yes We used three datasets: Nation4 (Kim, Xie, and Ong 2016), UMLS5 (Kim, Xie, and Ong 2016), and the Microsoft FB15K-237 (FB for short)6 (Toutanova et al. 2015). (...) 4https://github.com/dongwookim-ml/kg-data/tree/master/nation (...) 5https://github.com/dongwookim-ml/kg-data/tree/master/umls (...) 6https://www.microsoft.com/en-us/download/details.aspx?id=52312
Dataset Splits No The paper describes the generation of 66 tests, each with 1 original and 9 fake KGs for human evaluation. However, it does not specify traditional train/validation/test dataset splits used for training or evaluating a machine learning model, as the experiment involves human subjects evaluating generated KGs.
Hardware Specification Yes We implemented the algorithm in Python on a 2.3 GHz Dual-Core Intel Core i5 with 8GB of LPDDR3 RAM, running Mac OS Catalina Version 10.15.6.
Software Dependencies No We implemented the algorithm in Python on a (...) running Mac OS Catalina Version 10.15.6. (Python version is not specified, and no other key software components with version numbers are provided.)
Experiment Setup Yes For each of the 3 datasets we extracted 22 original KGs and, for each KG, we computed 9 fake KGs in particular, we computed 3 fake KGs for each of the following ranges of τ: [0, 1/3], [1/3, 2/3], and [2/3, 1]. (...) The set U was derived as follows: first, we randomly picked a subgraph of the original dataset; then, we built new KGs by randomly adding and deleting vertices/edges/labels to the KGs built so far. (...) We used the Jaccard distance function.