Deep Network Approximation in Terms of Intrinsic Parameters

Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct several experiments to numerically verify that training a small part of parameters can also achieve good results for classification problems if other parameters are prespecified or pre-trained from a related problem.
Researcher Affiliation Academia 1Department of Mathematics, National University of Singapore, Singapore 2Department of Mathematics, University of Maryland, College Park, United States.
Pseudocode No The paper contains network architecture illustrations and mathematical definitions of functions (e.g., 'ψi+1(x) := mid ψi(x δei+1), ψi(x), ψi(x + δei+1)'), but no clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide any link to source code or explicitly state that source code will be made available.
Open Datasets Yes Finally, we use the proposed CNN architecture to conduct several experiments and present the numerical results for three comment datasets: MNIST, Kuzushiji-MNIST (KMNIST), and Fashion-MNIST (FMNIST).
Dataset Splits No The paper mentions using 'all training samples' and 'all test samples' for different strategies, but does not explicitly state the train/validation/test dataset splits needed for full reproduction (e.g., '80/10/10 split').
Hardware Specification No No specific hardware details (e.g., 'NVIDIA A100', 'Tesla V100', 'Intel Xeon') used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions 'RAdam' as the optimization method, 'dropout' and 'batch normalization' as regularization methods, but does not specify versions of any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) used for implementation.
Experiment Setup Yes The number of epochs and the batch size are set to 300 and 128, respectively. We adopt RAdam (Liu et al., 2020) as the optimization method. The weight decay of the optimizer is 0.0001 and the learning rate is 0.002 0.93i 1 in the i-th epoch.