VC dimension of partially quantized neural networks in the overparametrized regime

Authors: Yutong Wang, Clayton Scott

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further demonstrate the expressivity of HANNs empirically. On a panel of 121 UCI datasets, overparametrized HANNs match the performance of state-of-the-art full-precision models. Neural networks have become an indispensable tool for machine learning practitioners, owing to their impressive performance especially in vision and natural language processing (Goodfellow et al., 2016).
Researcher Affiliation Academia Yutong Wang1 & Clayton Scott1 2 1Department of Electrical Engineering and Computer Science 2Department of Statistics University of Michigan Ann Arbor, MI 48109, USA {yutongw,clayscot}@umich.edu
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes All code for downloading and parsing the data, training the models, and generating plots in this manuscript are available at https://github.com/Yutong Wang UMich/HANN.
Open Datasets Yes We benchmark the empirical performance of HANNs on a panel of 121 UCI datasets, following several recent neural network and neural tangent kernel works (Klambauer et al., 2017; Wu et al., 2018; Arora et al., 2019; Shankar et al., 2020).
Dataset Splits Yes We use the same train, validation, and test sets as in Klambauer et al. (2017). The reported accuracies on the held-out test set are based on the model with the highest validation accuracy.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU/CPU models).
Software Dependencies Yes Our implementation uses Tensor Flow (Abadi et al., 2016) with the Larq (Geiger & Team, 2020) library for training neural networks with threshold activations. # import larq as lq qtz = lq.quantizers.Swish Sign(). # from tensorflow.keras.layers import Dense, Dropout. # from tensorflow.keras.layers import Dense, Add
Experiment Setup Yes HANN15 is trained with a hyperparameter grid of size 3 where only the dropout rate is tuned. The hyperparameters are summarized in Table 2. The model with the highest smoothed validation accuracy is chosen. OPTIMIZER SGD LEARNING RATE 0.01 DROPOUT RATE {0.1, 0.25, 0.5} MINIBATCH SIZE 128 BOOLEAN FUNCTION 1-HIDDEN LAYER RESNET WITH 1000 HIDDEN NODES EPOCHS 100 MINIBOONE 5000 FOR ALL OTHERS.