VC dimension of partially quantized neural networks in the overparametrized regime
Authors: Yutong Wang, Clayton Scott
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further demonstrate the expressivity of HANNs empirically. On a panel of 121 UCI datasets, overparametrized HANNs match the performance of state-of-the-art full-precision models. Neural networks have become an indispensable tool for machine learning practitioners, owing to their impressive performance especially in vision and natural language processing (Goodfellow et al., 2016). |
| Researcher Affiliation | Academia | Yutong Wang1 & Clayton Scott1 2 1Department of Electrical Engineering and Computer Science 2Department of Statistics University of Michigan Ann Arbor, MI 48109, USA {yutongw,clayscot}@umich.edu |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code for downloading and parsing the data, training the models, and generating plots in this manuscript are available at https://github.com/Yutong Wang UMich/HANN. |
| Open Datasets | Yes | We benchmark the empirical performance of HANNs on a panel of 121 UCI datasets, following several recent neural network and neural tangent kernel works (Klambauer et al., 2017; Wu et al., 2018; Arora et al., 2019; Shankar et al., 2020). |
| Dataset Splits | Yes | We use the same train, validation, and test sets as in Klambauer et al. (2017). The reported accuracies on the held-out test set are based on the model with the highest validation accuracy. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU/CPU models). |
| Software Dependencies | Yes | Our implementation uses Tensor Flow (Abadi et al., 2016) with the Larq (Geiger & Team, 2020) library for training neural networks with threshold activations. # import larq as lq qtz = lq.quantizers.Swish Sign(). # from tensorflow.keras.layers import Dense, Dropout. # from tensorflow.keras.layers import Dense, Add |
| Experiment Setup | Yes | HANN15 is trained with a hyperparameter grid of size 3 where only the dropout rate is tuned. The hyperparameters are summarized in Table 2. The model with the highest smoothed validation accuracy is chosen. OPTIMIZER SGD LEARNING RATE 0.01 DROPOUT RATE {0.1, 0.25, 0.5} MINIBATCH SIZE 128 BOOLEAN FUNCTION 1-HIDDEN LAYER RESNET WITH 1000 HIDDEN NODES EPOCHS 100 MINIBOONE 5000 FOR ALL OTHERS. |