Layerwise Change of Knowledge in Neural Networks
Authors: Xu Cheng, Lei Cheng, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments to illustrate the sparsity of interactions. Given a well-trained DNN and an input sample x Rn, we calculated AND interactions Iand(S|x) and OR interactions Ior(S|x) of all 2n possible subsets5 S N. To this end, we trained VGG-11 (Simonyan & Zisserman, 2014), Res Net-20 (He et al., 2016) on the MNIST dataset (Le Cun et al., 1998) and CIFAR-10 datasets (Krizhevsky et al., 2009), respectively. |
| Researcher Affiliation | Academia | 1Nanjing University of Science and Technology. 2Shanghai Jiao Tong University. 3Zhejiang University. 4Stevens Institute of Technology. |
| Pseudocode | No | The paper presents mathematical formulations and descriptions of its approach, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We trained VGG-11 (Simonyan & Zisserman, 2014), Res Net-20 (He et al., 2016) on the MNIST dataset (Le Cun et al., 1998) and CIFAR-10 datasets (Krizhevsky et al., 2009), respectively. We also fine-tuned pre-trained Distil BERT (Sanh et al., 2019) and BERTBASE (Devlin et al., 2019) models on the SST-2 dataset (Socher et al., 2013) for binary sentiment classification. |
| Dataset Splits | No | The paper describes training models and using standard datasets, but it does not provide specific details on training/validation/test splits, such as explicit percentages for a validation set, or refer to predefined validation splits with citations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like SGD and BERT models but does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., PyTorch version, TensorFlow version). |
| Experiment Setup | Yes | For MLP-7 trained on the MNIST dataset, we used SGD with learning rate 0.01, and set the batch size to 256 to train the intermediate layers... For Distil BERT finetuned on the SST-2 dataset, we used SGD with learning rate 2E-5, and set the batch size to 32 to train the intermediate layers. |