Theoretical Investigation of Generalization Bound for Residual Networks

Authors: Hao Chen, Zhanfeng Mo, Zhouwang Yang, Xiao Wang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments also verify that Res Net structures contribute to better generalization properties. (Section 7: Numerical Experiments)
Researcher Affiliation Academia Hao Chen1, , Zhanfeng Mo1, , Zhouwang Yang1, and Xiao Wang2 1University of Science and Technology of China 2Purdue University {ch330822, oscarmzf}@mail.ustc.edu.com, yangzw@ustc.edu.cn, wangxiao@purdue.edu
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. It focuses on mathematical formulations and theoretical derivations.
Open Source Code No The rest of experiment data is concluded in the supplementary material [Mo and Chen, 2019]. (This statement refers to 'experiment data' in the supplementary material, not explicitly source code for the methodology.)
Open Datasets No We sample 500 random samples {(xi, yi)}500 i=1 R300 R for training set, and 1500 samples for testing set from the following procedure. (The paper describes how the dataset was sampled for the experiment but does not provide concrete access information or mention a publicly available dataset.)
Dataset Splits No We sample 500 random samples {(xi, yi)}500 i=1 R300 R for training set, and 1500 samples for testing set from the following procedure. (Only training and testing set sizes are specified; no validation split is mentioned.)
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No Then, we adopt Mean Square Loss as loss function and Re LU as active functions. (The paper mentions loss functions and activation functions but does not provide specific software library names or version numbers.)
Experiment Setup Yes We first initialize the weights of Net-A by the Xavier Initialization and set all the bias components as 0.1, as the choice in bias does not greatly affect the conclusion. As a control group, Net-B shares all the initialized parameters with Net A. We vary the scale of the initialization before training by dividing the weights from the Xavier Initialization by the scale . Then, we adopt Mean Square Loss as loss function and Re LU as active functions. ... We obtain evidence that supports our hypothesis by setting the scale as 10, 15, 20, 25, and other larger numbers. For each scale, we repeat the experiment for fifty times.