Generalization and Estimation Error Bounds for Model-based Neural Networks

Authors: Avner Shultzman, Eyar Azar, Miguel R. D. Rodrigues, Yonina C. Eldar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through a series of experiments that our theoretical insights shed light on a few behaviours experienced in practice, including the fact that ISTA and ADMM networks exhibit higher generalization abilities (especially for small number of training samples), compared to Re LU networks. ... 5 NUMERICAL EXPERIMENTS In this section, we present a series of experiments that concentrate on how a particular model-based network (ISTA network) compares to a Re LU network, and showcase the merits of model-based networks.
Researcher Affiliation Academia Avner Shultzman1, Eyar Azar1, Miguel R. D. Rodrigues2 & Yonina C. Eldar1 Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Israel1 Faculty of Engineering Science, University College London, UK2
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. Figure 1 shows architectural diagrams, not algorithmic steps.
Open Source Code Yes All results are reproducible through (Authors, 2022) which provides the complete code to execute the experiments presented in this section.
Open Datasets No The networks are trained on a simulated dataset to solve the problem in (1), with target vectors uniformly distributed in [ 1, 1]. ... The target and noise vectors are generated as element wise independently from a uniform distribution ranging in [ 1, 1]. The paper mentions a 'simulated dataset' generated by the authors, but does not provide access information (link, DOI, citation) for a publicly available or open dataset.
Dataset Splits No The paper states it uses 'm training samples' and an 'empirical approximation of h' but does not specify a distinct validation set or the percentages/counts for train/validation/test splits needed for reproduction.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions using the 'SGD optimizer' and 'L1 loss', but it does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes We focus on networks with 10 layers (similar to previous works (Gregor & Le Cun, 2010)), to represent realistic model-based network architectures. ... The sparsity rate is ρ = 0.15, and the noise s standard deviation is 0.1. To train the networks we used the SGD optimizer with the L1 loss over all neurons of the last layer.