GPEX, A Framework For Interpreting Artificial Neural Networks

Authors: Amir Hossein Hosseini Akbarnejad, Gilbert Bigras, Nilanjan Ray

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using our method we find out that on 5 datasets, only a subset of those theoretical assumptions are sufficient. Indeed, in our experiments we used a normal Res Net-18 or feed-forward backbone with a single wide layer in the end. On 5 datasets (4 image datasets, and 1 biological dataset) and ANNs with 2 types of functionality (classifier or attentionmechanism) we were able to find GPs whose outputs closely match those of the corresponding ANNs.
Researcher Affiliation Academia Amir Akbarnejad Department of Computing Science University of Alberta Edmonton, AB, Canada ah8@ualberta.ca Gilbert Bigras Department of Laboratory Medicine and Pathology University of Alberta Edmonton, AB, Canada Gilbert.Bigras@albertaprecisionlabs.ca Nilanjan Ray Department of Computing Science University of Alberta Edmonton, AB, Canada nray1@ualberta.ca
Pseudocode Yes Algorithm 1 Method Optim_Kern Mappings Input: Input instance x and inducing instance x, list of matrices U, list of vectors V. Output: Kernel-space mappings [f1(.), ..., f L(.)]. Initialisation : loss 0. 1: µ, cov forward_GP(x, x, U, V) //feed x to GPs, "forward_GP" is Alg.S1 in supplementary. 2: µann g(x) //feed x to ANN. 3: for ℓ= 1 to L do 4: loss loss + (µ[ℓ] µann[ℓ])2+σ2 g cov[ℓ] + log(cov[ℓ]). //Eq.5. 5: end for 6: δ loss params [f1(.),...,f L(.)] .//the gradient of loss. 7: params [f1, ..., f L] params [f1, ..., f L] lr δ //update the parameters. 8: lr updated learning rate 9: return [f1(.), ..., f L(.)] and Algorithm 2 Method Explain_ANN Input: Training dataset ds_train, and the inducing dataset ds_inducing. Output: Updated kernel-space mappings [f1(.), ..., f L(.)], and the other GP parameters U and V. Initialisation : U, V Init_GPparams(ds_inducing) //Alg.S3 in supplementary. 1: for iter = 1 to max_iter do 2: x randselect(ds_train). 3: x randselect(ds_inducing) 4: [f1(.), ..., f L(.)] Optim_Kern Mapings(x, x, U, V). 5: x randselect(ds_inducing). 6: for ℓ= 1 to L do 7: //update kernel-space representations. 8: U[ℓ][ x.index] fℓ( x) 9: end for 10: end for 11: return [f1(.), ..., f L(.)], U, V
Open Source Code Yes We implement our method as a publicly available tool called GPEX: https://github.com/amirakbarnejad/gpex.
Open Datasets Yes We conducted several experiments on four publicly available datasets: MNIST [9], Cifar10 [19], Kather [15], and Dogs Wolves [34].
Dataset Splits No No explicit mention of a validation split or how it's used for hyperparameter tuning. The paper states: 'For MNIST [9] and Cifar10 [19] we used the standard split to training and test sets provided by the datasets. For Kather [15] and Dogs Wolves [34] we randomly selected 70% and 80% of instances as our training set.'
Hardware Specification No The paper mentions 'GPU-accelaration' and 'Our package makes use of GPU-accelaration' but does not specify particular GPU models (e.g., NVIDIA, AMD, Intel), CPU models, or memory specifications used for running their experiments. It only refers to an RTX 3090 GPU in the context of a related work's limitation, not their own experimental setup.
Software Dependencies No The paper mentions software like 'Py Torch module', 'python library called GPEX', and external implementations (e.g., for ResNet [2], influence functions [1]) but does not provide specific version numbers for these software components or programming languages (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes The exact parameter settings for running Alg. 2 are elaborated upon in Sec. S5 of the supplementary. For Cifar10 [19], we used the exact optimizer suggested by [2]. For other datasets we used an Adam [17] optimizer with a learning-rate of 0.0001. We trained the pipelines for 20, 200, 20, and 20 epochs on MNIST [9], Cifar10 [19], Kather [15], and Dogs Wolves [34], respectively. For all datasets, we set the width (i.e. the number of neurons) of the second last fully-connected layer to 1024.