Great Minds Think Alike: The Universal Convergence Trend of Input Salience

Authors: Yipei Wang, Jeffrey Siskind, Xiaoqian Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments shed light on the significant implications of our hypotheses in various application domains, including black-box attacks, deep ensembles, etc. These findings not only enhance our understanding of DNN behaviors but also offer valuable insights for their practical application in diverse areas of deep learning.
Researcher Affiliation Academia Yipei Wang, Jeffrey Mark Siskind, Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907 wang4865,qobi,joywang@purdue.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code is provided as the supplementary material in the submission. The repository will be publicized upon acceptance.
Open Datasets Yes here we mainly follow the setups of the benign overfitting (Nakkiran et al., 2021), which also present a comprehensive study of optimized DNNs through CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009). Besides, we also include Tiny Imagenet-200 (Le and Yang, 2015) as a compromise between the computational efficiency and the dataset complexity.
Dataset Splits No The paper states 'experiments are carried out over the test set X = Xtest, Y = Ytest.' but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit standard split references for all three parts).
Hardware Specification Yes They are carried out on Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz with Quadro RTX 6000 GPUs.
Software Dependencies No The paper mentions using 'stochastic gradient descent (SGD) as the solver' but does not specify key software components with version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes As for the training process, following Nakkiran et al. (2021), we use stochastic gradient descent (SGD) as the solver, with a batch size of 128. The input data are normalized, but not augmented. We start with the initial learning rate γ0 = 0.1 and update it with γt = γ0/ 1 + t, where t is the epoch. Please refer to Appendix B for more experimental details.