Globally Optimal Training of Neural Networks with Threshold Activation Functions

Authors: Tolga Ergen, Halil Ibrahim Gulluk, Jonathan Lacotte, Mert Pilanci

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with various numerical experiments.
Researcher Affiliation Academia Department of Electrical Engineering Stanford University Stanford, CA 94305, USA
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-source code availability or links to code repositories.
Open Datasets Yes For this experiment, we use CIFAR-10 (Krizhevsky et al., 2014), MNIST (Le Cun), and the datasets in the UCI repository (Dua & Graff, 2017) which are preprocessed as in Fern andez-Delgado et al. (2014).
Dataset Splits No We also use the 80% 20% splitting ratio for the training and test sets of the UCI datasets.
Hardware Specification Yes We first note that all of the experiments in the paper are run on a single laptop with Intel(R) Core(TM) i7-7700HQ CPU and 16GB of RAM.
Software Dependencies No The paper mentions using "PyTorch s (Paszke et al., 2019)" but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes We also tune the learning rate of STE by performing a grid search on the set {5e 1, 1e 1, 5e 2, 1e 2, 5e 3, 1e 3}. As illustrated in Figure 5, the non-convex training heuristic STE fails to achieve the global minimum obtained by our convex training algorithm for 5 different initialization trials.