Layerwise Change of Knowledge in Neural Networks

Authors: Xu Cheng, Lei Cheng, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments to illustrate the sparsity of interactions. Given a well-trained DNN and an input sample x Rn, we calculated AND interactions Iand(S|x) and OR interactions Ior(S|x) of all 2n possible subsets5 S N. To this end, we trained VGG-11 (Simonyan & Zisserman, 2014), Res Net-20 (He et al., 2016) on the MNIST dataset (Le Cun et al., 1998) and CIFAR-10 datasets (Krizhevsky et al., 2009), respectively.
Researcher Affiliation Academia 1Nanjing University of Science and Technology. 2Shanghai Jiao Tong University. 3Zhejiang University. 4Stevens Institute of Technology.
Pseudocode No The paper presents mathematical formulations and descriptions of its approach, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We trained VGG-11 (Simonyan & Zisserman, 2014), Res Net-20 (He et al., 2016) on the MNIST dataset (Le Cun et al., 1998) and CIFAR-10 datasets (Krizhevsky et al., 2009), respectively. We also fine-tuned pre-trained Distil BERT (Sanh et al., 2019) and BERTBASE (Devlin et al., 2019) models on the SST-2 dataset (Socher et al., 2013) for binary sentiment classification.
Dataset Splits No The paper describes training models and using standard datasets, but it does not provide specific details on training/validation/test splits, such as explicit percentages for a validation set, or refer to predefined validation splits with citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software components like SGD and BERT models but does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., PyTorch version, TensorFlow version).
Experiment Setup Yes For MLP-7 trained on the MNIST dataset, we used SGD with learning rate 0.01, and set the batch size to 256 to train the intermediate layers... For Distil BERT finetuned on the SST-2 dataset, we used SGD with learning rate 2E-5, and set the batch size to 32 to train the intermediate layers.