Batch-shaping for learning conditional channel gated networks

Authors: Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results on CIFAR-10 and Image Net datasets for image classification, and Cityscapes for semantic segmentation. Our results show that our method can slim down large architectures conditionally, such that the average computational cost on the data is on par with a smaller architecture, but with higher accuracy.
Researcher Affiliation Industry Babak Ehteshami Bejnordi, Tijmen Blankevoort & Max Welling Qualcomm AI Research Amsterdam, The Netherlands {behtesha,tijmen,mwelling}@qti.qualcomm.com
Pseudocode Yes The pseudo-code for the implementation of the Batch-Shaping loss is presented in the Appendix A.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate the performance of our method on two image classification benchmarks: CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015). We additionally report preliminary results on the Cityscapes semantic segmentation benchmark (Cordts et al., 2016).
Dataset Splits Yes The original PSP network achieves an overall Io U (intersection over union) of 0.706 with a pixel-level accuracy of 0.929 on the validation set. ... Figure 5 shows the distribution of gates on the Image Net validation set for our Res Net34-BAS and Res Net34-L0 models.
Hardware Specification Yes All the reported inference times were measured using a machine equipped with an Intel Xeon E5-1620 v4 CPU and an Nvidia GTX 1080 Ti GPU.
Software Dependencies No The paper mentions implementing aspects in Pytorch ("Computation can be done on sliced tensors, which we implemented in Pytorch."), but it does not specify any version numbers for Pytorch or any other software libraries or solvers used.
Experiment Setup Yes The training details and hyperparameters for our gated networks trained on CIFAR10, Image Net, and Cityscapes are provided in the appendix B. ... We trained the models for 500 epochs with a mini-batch of 256. The initial learning rate was 0.1 and it was divided by 10 at epoch 300, 375, and 450. ... For the L0-loss we used γ values of {0, 1, 2, 5, 10, 15, 20} 10 2 to generate different trade-off points.