Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks
Authors: Itay Safran, Ohad Shamir
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also show that these gaps can be observed experimentally: Increasing the depth indeed allows better learning than increasing width, when training neural networks to learn an indicator of a unit ball. In this subsection, we empirically demonstrate that indicator functions of L2 balls are indeed easier to learn with a 3-layer network, compared to a 2-layer network (even if the 2-layer network is significantly larger). This indicates that the depth/width trade-off for indicators of balls, predicted by our theory, can indeed be observed experimentally. The results are presented in Fig. 1. |
| Researcher Affiliation | Academia | 1Weizmann Institute of Science, Rehovot, Israel. Correspondence to: Itay Safran <itay.safran@weizmann.ac.il>, Ohad Shamir <ohad.shamir@weizmann.ac.il>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | The authors created their own dataset but do not provide public access (no link, DOI, repository, or citation) for accessing it. The text states: 'For our experiment, we sampled 5 105 data instances in R100, with a direction chosen uniformly at random and a norm drawn uniformly at random from the interval [0, 2].' |
| Dataset Splits | Yes | For our experiment, we sampled 5 105 data instances in R100... Another 5 104 examples were generated in a similar manner and used as a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Tensor Flow library' but does not specify a version number for this or any other software component required for replication. |
| Experiment Setup | Yes | We used the squared loss ℓ(y, y ) = (y y)2 and batches of size 100. For all networks, we chose a momentum parameter of 0.95, and a learning rate starting at 0.1, decaying by a multiplicative factor of 0.95 every 1000 batches, and stopping at 10 4. |