Neural Network Architecture Beyond Width and Depth

Authors: Shijun Zhang, Zuowei Shen, Haizhao Yang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we use numerical experimentation to show the advantages of the super-approximation power of Re LU Nest Nets.
Researcher Affiliation Academia Zuowei Shen Department of Mathematics National University of Singapore matzuows@nus.edu.sg Haizhao Yang Department of Mathematics University of Maryland, College Park hzyang@umd.edu Shijun Zhang Department of Mathematics National University of Singapore zhangshijun@u.nus.edu
Pseudocode No The paper describes the network architecture and mathematical definitions but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not mention releasing source code for the described methodology or provide any links to a code repository.
Open Datasets Yes We will design convolutional neural network (CNN) architectures activated by Re LU or the subnetwork activation function ϱ given in Equation (4) to classify image samples in Fashion-MNIST [47].
Dataset Splits No The paper specifies training and test sample counts: For each i {0,1}, we randomly choose 3 105 training samples and 3 104 test samples in Si with label i. For Fashion-MNIST, it states: This dataset consists of a training set of 6 104 samples and a test set of 104 samples. However, it does not explicitly mention a validation split.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using RAdam [23] as the optimization method, but it does not specify any software names with version numbers for libraries, frameworks, or environments.
Experiment Setup Yes The number of epochs and the batch size are set to 500 and 512, respectively. We adopt RAdam [23] as the optimization method. In epochs 5(i 1) + 1 to 5i for i = 1,2, ,100, the learning rate is 0.2 0.002 0.9i 1 for the parameters in ϱ and 0.002 0.9i 1 for all other parameters.