Neuron with Steady Response Leads to Better Generalization

Authors: Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on Multilayer Perceptron, Convolutional Neural Networks, and Graph Neural Networks with popular benchmark datasets of diverse domains, which show that our Neuron Steadiness Regularization consistently outperforms the vanilla version of models with significant gain and low additional computational overhead.
Researcher Affiliation Collaboration Qiang Fu Microsoft Research Asia Beijing, China qifu@microsoft.com Lun Du Microsoft Research Asia Beijing, China lun.du@microsoft.com Haitao Mao Michigan State University Michigan, U.S. haitaoma@msu.edu Xu Chen Microsoft Research Asia Beijing, China xu.chen@microsoft.com Wei Fang Tsinghua University Beijing, China fw17@mails.tsinghua.edu.cn Shi Han Microsoft Research Asia Beijing, China shihan@microsoft.com Dongmei Zhang Microsoft Research Asia Beijing, China dongmeiz@microsoft.com
Pseudocode No The paper describes algorithms and methods using mathematical equations and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We share the corresponding link in Appendix ??" and "Appendix A.1 Code and Data: Code: https://github.com/microsoft/NeuronSteadinessRegularization
Open Datasets Yes MLP and CNN are applied to image recognition task on MNIST [22], CIFAR-10 [19] and Image Net [6] datasets, respectively. GNN is applied to node classification on four real-world graph datasets: Wiki CS [30], Pub Med [47], Amazon-Photo and Amazon-Computers [41]." and "For MNIST dataset, we divide 60000 training images into the training set with 50000 samples and the validation set with the remaining 10000 samples to select hyper-parameter. For each of the four graph datasets, it is randomly split into training, validation, and testing sets with a ratio of 6:2:2.
Dataset Splits Yes For MNIST dataset, we divide 60000 training images into the training set with 50000 samples and the validation set with the remaining 10000 samples to select hyper-parameter. For each of the four graph datasets, it is randomly split into training, validation, and testing sets with a ratio of 6:2:2.
Hardware Specification Yes All experiments are conducted on machines with NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies No The paper mentions software like 'torchversion library' and optimizers like 'SGD' and 'Adam', but it does not specify version numbers for any key software components or libraries required for reproduction.
Experiment Setup Yes Experiment Settings. We follow the typical implementation settings to conduct our experiments. For Res Net-18 and VGG19 on CIFAR-10, we follow the detailed setting of [48, 9]. For Res Net-50 on Image Net, we follow the official implementation provided by torchversion library 4. For Graph SAGE and GCN, we follow the implementation setting of [28]. For MNIST dataset, we divide 60000 training images into the training set with 50000 samples and the validation set with the remaining 10000 samples to select hyper-parameter. For each of the four graph datasets, it is randomly split into training, validation, and testing sets with a ratio of 6:2:2. Note that, SGD [40] is used to optimize MLP, and Adam [16] for other models except for Res Net-18 that is optimized by Momentum [40] according to the implementation setting of [48]. We use the typical setting of batch size as 100 for all experiments. To ensure the model convergence, training epochs are set as 100 for both MLP and GNN, 200 for Res Net and 500 for VGG. In all experiments except for RQ4, we apply NSR to only one particular layer with the same λ for each neuron. For this only one hyper-parameter λ, like most other regularization methods, we apply a random search strategy to find its proper value ranging from 1e-2 to 10. The error rate is the evaluation metric and each result is averaged over 5 runs with different random seeds.