reproducibilityindex.ai

Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

Authors: Michael Sejr Schlichtkrull, Nicola De Cao, Ivan Titov

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that such a classiﬁer can be trained in a fully differentiable fashion, employing stochastic gates and encouraging sparsity through the expected L0 norm. We use our technique as an attribution method to analyse GNN models for two tasks question answering and semantic role labelling providing insights into the information ﬂow in these models. We demonstrate using artiﬁcial data the shortcomings of the closest existing method, and show how our method addresses those shortcomings and improves faithfulness. We use GRAPHMASK to analyse GNN models for two NLP tasks: semantic role labeling (Marcheggiani & Titov, 2017) and multi-hop question answering (De Cao et al., 2019).
Researcher Affiliation	Academia	1University of Amsterdam, 2University of Edinburgh m.s.schlichtkrull@uva.nl, n.decao@uva.nl, ititov@inf.ed.ac.uk
Pseudocode	No	Not found. The paper provides mathematical equations describing the GNN and GRAPHMASK formulations (e.g., Equations 1-13) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1Source code available at https://github.com/Mich Schli/Graph Mask.
Open Datasets	Yes	SRL We used the English Co NLL-2009 shared task dataset (Hajiˇc et al., 2009). This dataset contains 179.014 training predicates, 6390 validation predicates, and 10498 test predicates. The dataset can be accessed at https://ufal.mff.cuni.cz/conll2009-st/. QA For question answering, we used the Wiki Hop dataset (Welbl et al., 2018), and the preprocessing script from De Cao et al. (2019). See Table 4 for details. The dataset can be accessed at https://qangaroo.cs.ucl.ac.uk/.
Dataset Splits	Yes	SRL We used the English Co NLL-2009 shared task dataset (Hajiˇc et al., 2009). This dataset contains 179.014 training predicates, 6390 validation predicates, and 10498 test predicates.
Hardware Specification	Yes	We carried out all experiments on a single Titan X-GPU.
Software Dependencies	No	Thus, we employ Adam (Kingma & Ba, 2015) with initial learning rate 1e 4 for GRAPHMASK, and RMSProp (Tieleman & Hinton, 2012) with learning rate 1e 2 for λ.
Experiment Setup	Yes	When training GRAPHMASK, we found it helpful to employ a regime wherein gates are progressively added to layers, starting from the top. For a model with K layers, we begin by adding gates only for layer k, and train the parameters for these gates for δ iterations. We then add gates for the next layer k 1, train all sets of gates for another δ iterations, and continue downwards in this manner. Optimising for sparsity under the performance constraint using the development set, we found the method to perform best with δ = 1 for SRL, while the optimal setting for QA was δ = 3. We found it necessary to use separate optimizers and learning for the Lagrangian λ parameter and for the parameters of GRAPHMASK. Thus, we employ Adam (Kingma & Ba, 2015) with initial learning rate 1e 4 for GRAPHMASK, and RMSProp (Tieleman & Hinton, 2012) with learning rate 1e 2 for λ. For the tolerance parameter β, we found β = 0.03 to perform well for all tasks.