Theoretical Investigation of Generalization Bound for Residual Networks
Authors: Hao Chen, Zhanfeng Mo, Zhouwang Yang, Xiao Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments also verify that Res Net structures contribute to better generalization properties. (Section 7: Numerical Experiments) |
| Researcher Affiliation | Academia | Hao Chen1, , Zhanfeng Mo1, , Zhouwang Yang1, and Xiao Wang2 1University of Science and Technology of China 2Purdue University {ch330822, oscarmzf}@mail.ustc.edu.com, yangzw@ustc.edu.cn, wangxiao@purdue.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It focuses on mathematical formulations and theoretical derivations. |
| Open Source Code | No | The rest of experiment data is concluded in the supplementary material [Mo and Chen, 2019]. (This statement refers to 'experiment data' in the supplementary material, not explicitly source code for the methodology.) |
| Open Datasets | No | We sample 500 random samples {(xi, yi)}500 i=1 R300 R for training set, and 1500 samples for testing set from the following procedure. (The paper describes how the dataset was sampled for the experiment but does not provide concrete access information or mention a publicly available dataset.) |
| Dataset Splits | No | We sample 500 random samples {(xi, yi)}500 i=1 R300 R for training set, and 1500 samples for testing set from the following procedure. (Only training and testing set sizes are specified; no validation split is mentioned.) |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | Then, we adopt Mean Square Loss as loss function and Re LU as active functions. (The paper mentions loss functions and activation functions but does not provide specific software library names or version numbers.) |
| Experiment Setup | Yes | We first initialize the weights of Net-A by the Xavier Initialization and set all the bias components as 0.1, as the choice in bias does not greatly affect the conclusion. As a control group, Net-B shares all the initialized parameters with Net A. We vary the scale of the initialization before training by dividing the weights from the Xavier Initialization by the scale . Then, we adopt Mean Square Loss as loss function and Re LU as active functions. ... We obtain evidence that supports our hypothesis by setting the scale as 10, 15, 20, 25, and other larger numbers. For each scale, we repeat the experiment for fifty times. |