Globally Optimal Training of Neural Networks with Threshold Activation Functions
Authors: Tolga Ergen, Halil Ibrahim Gulluk, Jonathan Lacotte, Mert Pilanci
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical results with various numerical experiments. |
| Researcher Affiliation | Academia | Department of Electrical Engineering Stanford University Stanford, CA 94305, USA |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to code repositories. |
| Open Datasets | Yes | For this experiment, we use CIFAR-10 (Krizhevsky et al., 2014), MNIST (Le Cun), and the datasets in the UCI repository (Dua & Graff, 2017) which are preprocessed as in Fern andez-Delgado et al. (2014). |
| Dataset Splits | No | We also use the 80% 20% splitting ratio for the training and test sets of the UCI datasets. |
| Hardware Specification | Yes | We first note that all of the experiments in the paper are run on a single laptop with Intel(R) Core(TM) i7-7700HQ CPU and 16GB of RAM. |
| Software Dependencies | No | The paper mentions using "PyTorch s (Paszke et al., 2019)" but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We also tune the learning rate of STE by performing a grid search on the set {5e 1, 1e 1, 5e 2, 1e 2, 5e 3, 1e 3}. As illustrated in Figure 5, the non-convex training heuristic STE fails to achieve the global minimum obtained by our convex training algorithm for 5 different initialization trials. |