CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Authors: Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that scaling networks with Cond Conv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks. On Image Net classification, our Cond Conv approach applied to Efficient Net-B0 achieves state-of-the-art performance of 78.3% accuracy with only 413M multiply-adds. |
| Researcher Affiliation | Industry | Brandon Yang Google Brain bcyang@google.com Gabriel Bender Google Brain gbender@google.com Quoc V. Le Google Brain qvl@google.com Jiquan Ngiam Google Brain jngiam@google.com |
| Pseudocode | No | The paper describes the Cond Conv formulation using mathematical equations and textual explanations, but it does not include a structured pseudocode block or an algorithm figure. |
| Open Source Code | Yes | Code and checkpoints for the Cond Conv Tensorflow layer and Cond Conv-Efficient Net models are available at: https://github.com/tensorflow/tpu/tree/master/ models/official/efficientnet/condconv. |
| Open Datasets | Yes | We evaluate our approach on the Image Net 2012 classification dataset [35]. The Image Net dataset consists of 1.28 million training images and 50K validation images from 1000 classes. We next evaluate the effectiveness of Cond Conv on a different task and dataset with the COCO object detection dataset [24]. |
| Dataset Splits | Yes | The Image Net dataset consists of 1.28 million training images and 50K validation images from 1000 classes. We train all models on the entire training set and compare the single-crop top-1 validation set accuracy with input image resolution 224x224. Following Howard et al. [15], we train on the combined COCO training and validation sets excluding 8,000 minival images, which we evaluate our networks on. |
| Hardware Specification | No | The paper mentions 'Current accelerators are optimized to train on large batch convolutions' and 'our hardware configuration' but does not specify any particular GPU, CPU, or TPU models used for the experiments. |
| Software Dependencies | No | The paper states 'Code and checkpoints for the Cond Conv Tensorflow layer', implying the use of TensorFlow, but it does not specify any version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | For Mobile Net V1, Mobile Net V2, and Res Net-50, we use the same training hyperparameters for all models on Image Net, following [21], except we use Batch Norm momentum of 0.9 and disable exponential moving average on weights. For Mnas Net [41] and Efficient Net [42], we use the same training hyperparameters as the original papers, with the batch size, learning rate, and training steps scaled appropriately for our hardware configuration. First, we use Dropout [39] on the input to the fully-connected layer preceding the logits, with keep probability between 0.6 and 1.0. Second, we also add data augmentation using the Auto Augment [6] Image Net policy and Mixup [49] with α = 0.2. |