Is Deeper Better only when Shallow is Good?

Authors: Eran Malach, Shai Shalev-Shwartz

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on learning fractal distributions with deep networks trained with SGD and assert that the approximation curve has a crucial effect on whether a depth efficiency is observed or not. ... In this section we present our experimental results on learning deep networks with Adam optimizer ([7]). ... We perform the same experiments with different fractal structures ... Finally, we want to show that the results given in this paper are interesting beyond the scope of our admittedly synthetic fractal distributions. ... To address this concern, we performed similar experiments on the CIFAR-10 data, studying the effect of width and depth on the performance of neural-networks on real data.
Researcher Affiliation Academia Eran Malach School of Computer Science The Hebrew University Jerusalem, Israel eran.malach@mail.huji.ac.il Shai Shalev-Shwartz School of Computer Science The Hebrew University Jerusalem, Israel shais@cs.huji.ac.il
Pseudocode No The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets Yes Finally, we analyze the behavior of networks of growing depth on CIFAR-10. ... We performed similar experiments on the CIFAR-10 data, studying the effect of width and depth on the performance of neural-networks on real data.
Dataset Splits No The paper states "We sample 50K examples for a train dataset and 5K examples for a test dataset." but does not mention a validation split.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions "Adam optimizer ([7])" and "Tensorflow. Cifar-10 tensorflow tutorial, models/tutorials/image/cifar10. 2018." but does not specify version numbers for these software dependencies.
Experiment Setup Yes We train feed-forward networks of varying depth and width on a 2D Cantor distribution of depth 5. We sample 50K examples for a train dataset and 5K examples for a test dataset. We train the networks on this dataset with Adam optimizer for 106 iterations, with batch size of 100 and different learning rates. We observe the best performance of each configuration (depth and width) on the test data along the runs.