Most Activation Functions Can Win the Lottery Without Excessive Depth

Authors: Rebekka Burkholz

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments To demonstrate that our theoretical results make realistic claims, we present three sets of experiments that highlight different advantages of the L + 1-construction and the 2L-construction. In all cases, we emulate our constructive existence proofs by pruning source networks to approximate a given target network. All experiments were conducted on a machine with Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz processor and GPU NVIDIA Ge Force RTX 3080 Ti. Table 1: LT pruning results on MNIST. Averages and 0.95 standard confidence intervals are reported for 5 independent source network initializations. Parameters are counted in packs of 1000.
Researcher Affiliation Academia Rebekka Burholz CISPA Helmholtz Center for Information Security 66123 Saarbrücken, Germany burkholz@cispa.de
Pseudocode No The paper contains detailed proof outlines (e.g., "Proof Outline" for Theorem 2.5 and 2.6) which describe steps, but these are not formatted as pseudocode or an algorithm block.
Open Source Code Yes Code is available on Github (Relational ML/LT-existence).
Open Datasets Yes As the influential work [13], we use Iterative Magnitude Pruning (IMP) on Le Net networks with architecture [784, 300, 100, 10] to find LTs that achieve a good performance on the MNIST classification task [7].
Dataset Splits No The paper mentions training on 'MNIST classification task' and 'tiny-Image Net training data' and evaluating on 'tiny-Image Net test data', but does not specify a validation dataset split or percentages for any splits (e.g., 80/10/10, or specific counts for validation).
Hardware Specification Yes All experiments were conducted on a machine with Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz processor and GPU NVIDIA Ge Force RTX 3080 Ti.
Software Dependencies No Using the Pytorch implementation of the Gihub repository open_lth1 with MIT license, we arrive at a target network for each of four considered activation functions after 12 pruning steps: RELU, LRELU, SIGMOID, and TANH.
Experiment Setup Yes Using the Pytorch implementation of the Gihub repository open_lth1 with MIT license, we arrive at a target network for each of four considered activation functions after 12 pruning steps: RELU, LRELU, SIGMOID, and TANH. Their performance and number of nonzero parameters are reported in Table 1 in the target column alongside our results for the (L+1)-construction and our 2L construction, which achieve a similar performance.