reproducibilityindex.ai

Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation

Authors: Mrigank Raman, Aaron Chan, Siddhant Agarwal, PeiFeng Wang, Hansen Wang, Sungchul Kim, Ryan Rossi, Handong Zhao, Nedim Lipka, Xiang Ren

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we test KG-augmented models on their ability to maintain performance and explainability when the KG has been extensively perturbed. As explained in Sec. 2 and Fig. 1, the model is ﬁrst trained on a given dataset using the original KG, frozen throughout KG perturbation, then used to compare downstream performance between original KG and perturbed KG. For all models, datasets, and perturbation methods, we measure performance and KG similarity when all \|T \| KG edges have been perturbed, averaged over three runs.
Researcher Affiliation	Collaboration	1Indian Institute of Technology, Delhi, 2University of Southern California, 3 Indian Institute of Technology, Kharagpur, 4Tsinghua University, 5Adobe Research
Pseudocode	No	The paper describes the RL-RR algorithm and its components (actions, states, reward, DQN architecture) in detail, and includes Fig. 3, but does not provide a formal pseudocode or algorithm block.
Open Source Code	Yes	Code and data are available at https: //github.com/INK-USC/deceive-KG-models.
Open Datasets	Yes	We evaluate on the Commonsense QA (CSQA) (Talmor et al., 2018) and Open Book QA (OBQA) (Mihaylov et al., 2018) datasets, using Concept Net (Speer et al., 2016) as the KG. ... We evaluate these models on the Last.FM (Rendle, 2012) and Movie Lens-20M (Harper & Konstan, 2016) datasets, using the item KG from Wang et al. (2019a).
Dataset Splits	Yes	Let Fθ be an KGaugmented model, and let (Xtrain, Xdev, Xtest) be a dataset for some downstream task. ... Furthermore, following Wang et al. (2018b), we simulate a cold start scenario by using only 20% and 40% of the train set for Last.FM and Movie Lens-20M, respectively.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions BERT-Base and TransE embeddings, and the DQN algorithm, but does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup	Yes	For commonsense QA, the KG-augmented models we experiment with are RN (with attentional path aggregation) (Lin et al., 2019; Santoro et al., 2017) and MHGRN (Feng et al., 2020)... For both RN and MHGRN, we use a BERT-Base (Devlin et al., 2018) text encoder. ... For item recommendation, we use validation AUC as the reward function. ... we simulate a cold start scenario by using only 20% and 40% of the train set for Last.FM and Movie Lens-20M, respectively.