Generalization and Estimation Error Bounds for Model-based Neural Networks
Authors: Avner Shultzman, Eyar Azar, Miguel R. D. Rodrigues, Yonina C. Eldar
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate through a series of experiments that our theoretical insights shed light on a few behaviours experienced in practice, including the fact that ISTA and ADMM networks exhibit higher generalization abilities (especially for small number of training samples), compared to Re LU networks. ... 5 NUMERICAL EXPERIMENTS In this section, we present a series of experiments that concentrate on how a particular model-based network (ISTA network) compares to a Re LU network, and showcase the merits of model-based networks. |
| Researcher Affiliation | Academia | Avner Shultzman1, Eyar Azar1, Miguel R. D. Rodrigues2 & Yonina C. Eldar1 Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Israel1 Faculty of Engineering Science, University College London, UK2 |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. Figure 1 shows architectural diagrams, not algorithmic steps. |
| Open Source Code | Yes | All results are reproducible through (Authors, 2022) which provides the complete code to execute the experiments presented in this section. |
| Open Datasets | No | The networks are trained on a simulated dataset to solve the problem in (1), with target vectors uniformly distributed in [ 1, 1]. ... The target and noise vectors are generated as element wise independently from a uniform distribution ranging in [ 1, 1]. The paper mentions a 'simulated dataset' generated by the authors, but does not provide access information (link, DOI, citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper states it uses 'm training samples' and an 'empirical approximation of h' but does not specify a distinct validation set or the percentages/counts for train/validation/test splits needed for reproduction. |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions using the 'SGD optimizer' and 'L1 loss', but it does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | We focus on networks with 10 layers (similar to previous works (Gregor & Le Cun, 2010)), to represent realistic model-based network architectures. ... The sparsity rate is ρ = 0.15, and the noise s standard deviation is 0.1. To train the networks we used the SGD optimizer with the L1 loss over all neurons of the last layer. |