Learning Compact Neural Networks with Regularization
Authors: Samet Oymak
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To support our theoretical findings, we present numerical performance of sparsity and convolutional constraints for neural network training. We consider synthetic simulations where o is a vector of all ones and weight matrix W Rh p is sparse or corresponds to a CNN. |
| Researcher Affiliation | Collaboration | University of California, Riverside, CA, USA. Work done at The Voleon Group, Berkeley, CA, USA. |
| Pseudocode | No | The paper describes algorithms (e.g., Projected Gradient Descent) and their iterations but does not provide them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the methodology, nor does it include a link to a code repository. |
| Open Datasets | No | We consider synthetic simulations where o is a vector of all ones and weight matrix W Rh p is sparse or corresponds to a CNN. ... We generate W matrices with exactly s nonzero entries at each row and nonzero pattern is distributed uniformly at random. ... We generate kernel entries with i.i.d. N(0, p/hb) and the random matrix Z with i.i.d. N(0, p/bk) entries. The paper does not provide access information (link, citation, etc.) for a publicly available or open dataset, as it uses synthetic data generated for the experiments. |
| Dataset Splits | No | For training, we use n data points which varies from 100 to 1000. Test error is obtained by averaging ntest = 1000 independent data points. The paper mentions training and test sets but does not specify a validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We set the learning rate to µ = 5. ... We picked p = 80, h = 20 and s = p/10 = 8. For training, we use n data points which varies from 100 to 1000. ... Problem parameters are input dimension p = 81, kernel width b = 15, stride s = 6, number of kernels k = 4 and learning rate µ = 1. |