Neural Network Architecture Beyond Width and Depth
Authors: Shijun Zhang, Zuowei Shen, Haizhao Yang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we use numerical experimentation to show the advantages of the super-approximation power of Re LU Nest Nets. |
| Researcher Affiliation | Academia | Zuowei Shen Department of Mathematics National University of Singapore matzuows@nus.edu.sg Haizhao Yang Department of Mathematics University of Maryland, College Park hzyang@umd.edu Shijun Zhang Department of Mathematics National University of Singapore zhangshijun@u.nus.edu |
| Pseudocode | No | The paper describes the network architecture and mathematical definitions but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention releasing source code for the described methodology or provide any links to a code repository. |
| Open Datasets | Yes | We will design convolutional neural network (CNN) architectures activated by Re LU or the subnetwork activation function ϱ given in Equation (4) to classify image samples in Fashion-MNIST [47]. |
| Dataset Splits | No | The paper specifies training and test sample counts: For each i {0,1}, we randomly choose 3 105 training samples and 3 104 test samples in Si with label i. For Fashion-MNIST, it states: This dataset consists of a training set of 6 104 samples and a test set of 104 samples. However, it does not explicitly mention a validation split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using RAdam [23] as the optimization method, but it does not specify any software names with version numbers for libraries, frameworks, or environments. |
| Experiment Setup | Yes | The number of epochs and the batch size are set to 500 and 512, respectively. We adopt RAdam [23] as the optimization method. In epochs 5(i 1) + 1 to 5i for i = 1,2, ,100, the learning rate is 0.2 0.002 0.9i 1 for the parameters in ϱ and 0.002 0.9i 1 for all other parameters. |