Optimal Sets and Solution Paths of ReLU Networks

Authors: Aaron Mishkin, Mert Pilanci

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with pruning Re LU networks using this approach in Section 5 and that show it is more effective than naive pruning strategies. 5. Experiments Through convex reformulations, we have characterized the optimal sets of Re LU networks, minimal networks, and sensitivity results . Our goal in this section is to illustrate the power of our framework for analyzing Re LU networks and developing new algorithms. Tuning: We first consider a tuning task on 10 binary classification datasets from the UCI repository (Dua & Graff, 2017). For each dataset, we do a train/validation/test split, fit a two-layer Re LU model on the training set, and then compute the minimum ℓ2-norm model. We use this to explore the optimal set in three ways: (i) we compute an extreme point that (approximately) maximizes the model s ℓ2-norm; (ii) we minimize the validation MSE over W (λ); (iii) we minimize test MSE over W (λ).
Researcher Affiliation Academia Aaron Mishkin 1 Mert Pilanci 2 1Department of Computer Science, Stanford University 2Department of Electrical Engineering, Stanford University.
Pseudocode Yes Algorithm 1 Optimal Solution Pruning. Algorithm 2 Approximate Re LU Pruning.
Open Source Code Yes Code to replicate all of our experiments is provided at https://github.com/pilancilab/relu_optimal_sets.
Open Datasets Yes Tuning: We first consider a tuning task on 10 binary classification datasets from the UCI repository (Dua & Graff, 2017). ... Figure 3 presents similar results for two binary tasks taken from the CIFAR-10 dataset (Krizhevsky et al., 2009). We provide experiments on additional datasets, including MNIST (Le Cun et al., 1998), and experimental details in Appendix D.
Dataset Splits Yes For each dataset, we use a random 60/20/20 split of the data into train, validation, and test sets.
Hardware Specification No The acknowledgements section mentions support from "the Stanford Research Computing Center" and "the ACCESS AI Chip Center for Emerging Smart Systems through Inno HK, Hong Kong, SAR", but no specific hardware models (e.g., GPU, CPU models, or memory specifications) are provided.
Software Dependencies Yes We use the commercial interior point method MOSEK (Ap S, 2022) through the interface provided by CVXPY (Diamond & Boyd, 2016) to compute the initial model which is then tuned. ... MOSEK optimizer API for python. Version, 9(17): 6 4, 2022.
Experiment Setup Yes For each dataset, we use fixed λ = 0.001 and a maximum of 100 neurons. We modify the tolerances of this method to use τ = 10 8 for measuring both primal convergence and violation of the constraints. ... We repeat each experiment five times with different random splits of the data and random resamplings of 500 activation patterns Di. ... For MNIST, we use λ 0.01, while we used λ = 0.05 for CIFAR-10. We sample 50 activation patterns Di for each tasking, which produces a maximum of 100 neurons in each final model.