Most Activation Functions Can Win the Lottery Without Excessive Depth
Authors: Rebekka Burkholz
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments To demonstrate that our theoretical results make realistic claims, we present three sets of experiments that highlight different advantages of the L + 1-construction and the 2L-construction. In all cases, we emulate our constructive existence proofs by pruning source networks to approximate a given target network. All experiments were conducted on a machine with Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz processor and GPU NVIDIA Ge Force RTX 3080 Ti. Table 1: LT pruning results on MNIST. Averages and 0.95 standard confidence intervals are reported for 5 independent source network initializations. Parameters are counted in packs of 1000. |
| Researcher Affiliation | Academia | Rebekka Burholz CISPA Helmholtz Center for Information Security 66123 Saarbrücken, Germany burkholz@cispa.de |
| Pseudocode | No | The paper contains detailed proof outlines (e.g., "Proof Outline" for Theorem 2.5 and 2.6) which describe steps, but these are not formatted as pseudocode or an algorithm block. |
| Open Source Code | Yes | Code is available on Github (Relational ML/LT-existence). |
| Open Datasets | Yes | As the influential work [13], we use Iterative Magnitude Pruning (IMP) on Le Net networks with architecture [784, 300, 100, 10] to find LTs that achieve a good performance on the MNIST classification task [7]. |
| Dataset Splits | No | The paper mentions training on 'MNIST classification task' and 'tiny-Image Net training data' and evaluating on 'tiny-Image Net test data', but does not specify a validation dataset split or percentages for any splits (e.g., 80/10/10, or specific counts for validation). |
| Hardware Specification | Yes | All experiments were conducted on a machine with Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz processor and GPU NVIDIA Ge Force RTX 3080 Ti. |
| Software Dependencies | No | Using the Pytorch implementation of the Gihub repository open_lth1 with MIT license, we arrive at a target network for each of four considered activation functions after 12 pruning steps: RELU, LRELU, SIGMOID, and TANH. |
| Experiment Setup | Yes | Using the Pytorch implementation of the Gihub repository open_lth1 with MIT license, we arrive at a target network for each of four considered activation functions after 12 pruning steps: RELU, LRELU, SIGMOID, and TANH. Their performance and number of nonzero parameters are reported in Table 1 in the target column alongside our results for the (L+1)-construction and our 2L construction, which achieve a similar performance. |