Deep Network Approximation in Terms of Intrinsic Parameters
Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct several experiments to numerically verify that training a small part of parameters can also achieve good results for classification problems if other parameters are prespecified or pre-trained from a related problem. |
| Researcher Affiliation | Academia | 1Department of Mathematics, National University of Singapore, Singapore 2Department of Mathematics, University of Maryland, College Park, United States. |
| Pseudocode | No | The paper contains network architecture illustrations and mathematical definitions of functions (e.g., 'ψi+1(x) := mid ψi(x δei+1), ψi(x), ψi(x + δei+1)'), but no clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide any link to source code or explicitly state that source code will be made available. |
| Open Datasets | Yes | Finally, we use the proposed CNN architecture to conduct several experiments and present the numerical results for three comment datasets: MNIST, Kuzushiji-MNIST (KMNIST), and Fashion-MNIST (FMNIST). |
| Dataset Splits | No | The paper mentions using 'all training samples' and 'all test samples' for different strategies, but does not explicitly state the train/validation/test dataset splits needed for full reproduction (e.g., '80/10/10 split'). |
| Hardware Specification | No | No specific hardware details (e.g., 'NVIDIA A100', 'Tesla V100', 'Intel Xeon') used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'RAdam' as the optimization method, 'dropout' and 'batch normalization' as regularization methods, but does not specify versions of any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) used for implementation. |
| Experiment Setup | Yes | The number of epochs and the batch size are set to 300 and 128, respectively. We adopt RAdam (Liu et al., 2020) as the optimization method. The weight decay of the optimizer is 0.0001 and the learning rate is 0.002 0.93i 1 in the i-th epoch. |