Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Authors: Mrigank Raman, Aaron Chan, Siddhant Agarwal, PeiFeng Wang, Hansen Wang, Sungchul Kim, Ryan Rossi, Handong Zhao, Nedim Lipka, Xiang Ren
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we test KG-augmented models on their ability to maintain performance and explainability when the KG has been extensively perturbed. As explained in Sec. 2 and Fig. 1, the model is first trained on a given dataset using the original KG, frozen throughout KG perturbation, then used to compare downstream performance between original KG and perturbed KG. For all models, datasets, and perturbation methods, we measure performance and KG similarity when all |T | KG edges have been perturbed, averaged over three runs. |
| Researcher Affiliation | Collaboration | 1Indian Institute of Technology, Delhi, 2University of Southern California, 3 Indian Institute of Technology, Kharagpur, 4Tsinghua University, 5Adobe Research |
| Pseudocode | No | The paper describes the RL-RR algorithm and its components (actions, states, reward, DQN architecture) in detail, and includes Fig. 3, but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Code and data are available at https: //github.com/INK-USC/deceive-KG-models. |
| Open Datasets | Yes | We evaluate on the Commonsense QA (CSQA) (Talmor et al., 2018) and Open Book QA (OBQA) (Mihaylov et al., 2018) datasets, using Concept Net (Speer et al., 2016) as the KG. ... We evaluate these models on the Last.FM (Rendle, 2012) and Movie Lens-20M (Harper & Konstan, 2016) datasets, using the item KG from Wang et al. (2019a). |
| Dataset Splits | Yes | Let Fθ be an KGaugmented model, and let (Xtrain, Xdev, Xtest) be a dataset for some downstream task. ... Furthermore, following Wang et al. (2018b), we simulate a cold start scenario by using only 20% and 40% of the train set for Last.FM and Movie Lens-20M, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions BERT-Base and TransE embeddings, and the DQN algorithm, but does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | For commonsense QA, the KG-augmented models we experiment with are RN (with attentional path aggregation) (Lin et al., 2019; Santoro et al., 2017) and MHGRN (Feng et al., 2020)... For both RN and MHGRN, we use a BERT-Base (Devlin et al., 2018) text encoder. ... For item recommendation, we use validation AUC as the reward function. ... we simulate a cold start scenario by using only 20% and 40% of the train set for Last.FM and Movie Lens-20M, respectively. |