reproducibilityindex.ai

Influence Patterns for Explaining Information Flow in BERT

Authors: Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive empirical study of inﬂuence patterns for several NLP tasks: Subject-Verb Agreement (SVA), Reﬂexive Anaphora (RA), and Sentiment Analysis (SA). Our ﬁndings are summarized below.
Researcher Affiliation	Academia	Kaiji Lu , Zifan Wang, Piotr Mardziel, Anupam Datta Electrical and Computer Engineering Carnegie Mellon University Mountain View, CA 94089
Pseudocode	Yes	The detailed algorithm of GPR and analysis of its optimality can be found in Appendix B.1 and B.2.
Open Source Code	No	we will explore these limitations in future work and release our code and hope the proposed methods will serve as an insightful tool in future exploration.
Open Datasets	Yes	We consider two groups of NLP tasks: (1) subject-word agreement (SVA) and reﬂexive anaphora (RA)... (2) sentiment analysis(SA): we use 220 short examples (sentence length 17) from the evaluation set of the 2-class GLUE SST-2 sentiment analysis dataset [47].
Dataset Splits	Yes	For SST-2 we ﬁne-tuned on the pretrained BERTBASE[7] with L = 12, A = 12. We sample 1000 sentences from each subtask evenly distributed across different sentence types (e.g. singular/plural subject & singular/plural intervening noun) with a ﬁxed sentence structure
Hardware Specification	Yes	All computations are done with a Titan V on a machine with 64 GB of RAM. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this work.
Software Dependencies	No	The paper mentions using BERT models and references TensorFlow, but does not provide specific version numbers for any software or libraries used in their experiments.
Experiment Setup	Yes	Let the target node for SVA and RA tasks be the output of the Qo I score q(y) def = ycorrect ywrong. For instance, y IS y ARE for the sentence she [MASK] happy. Similarly, we use ypositive ynegative for sentiment analysis. We choose an uniform distribution over a linear path from xb to x as the distribution D in Def. 3 where the xb is chosen as the the input embedding of [MASK] because it can viewed a word with no information. For a given input token xi, we apply GPR differently depending on the sign of distributional inﬂuence g(x; q, D): if g(x; q, D) 0, we maximize the pattern inﬂuence towards q(y) at each iteration of the GPR otherwise we maximize pattern inﬂuence towards q(y).