Visual Concept Reasoning Networks

Authors: Taesup Kim, Sungwoong Kim, Yoshua Bengio8172-8180

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on visual recognition tasks such as image classification, semantic segmentation, object detection, scene recognition, and action recognition show that our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
Researcher Affiliation Collaboration Taesup Kim* 1, 2, Sungwoong Kim 2, Yoshua Bengio 1 1Mila, Universit e de Montr eal 2Kakao Brain *Now at Amazon Web Services (taesup@amazon.com).
Pseudocode No The paper describes the architecture and components in detail using mathematical formulations and descriptive text but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes We conduct experiments on a large-scale image classification task on the Image Net dataset (Russakovsky et al. 2015). ... We further do some experiments on object detection and instance segmentation on the MSCOCO 2017 dataset (Lin et al. 2014). ... Places365 (Zhou et al. 2017) is a dataset labeled with scene semantic categories for the scene recognition task. ... We use the Kinetics-400 dataset (Kay et al. 2017).
Dataset Splits Yes The dataset [ImageNet] consists of 1.28M training images and 50K validation images from 1000 different classes. All networks are trained on the training set and evaluated on the validation set by reporting the top-1 and top-5 errors with single center-cropping. ... MSCOCO dataset contains 115K images over 80 categories for training, 5K for validation. ... Places365-Standard setting that the train set has up to 1.8M images from 365 scene classes, and the validation set has 50 images per each class. ... Kinetics-400 dataset ... including 400 human action categories with 235K training videos and 20K validation videos.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only implies the use of computational resources without specifying them (e.g., 'GFLOPs' are mentioned, which is a measure of computational cost, but not hardware).
Software Dependencies No The paper mentions 'Detectron2' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes Our training setting is explained in detail in Appendix. ... For evaluation, we always take the final model, which is obtained by exponential moving average (EMA) with the decay value 0.9999. ... We employ and train the Mask R-CNN with FPN (He et al. 2017). We follow the training procedure of the Detectron2 and use the 1 schedule setting. Furthermore, synchronized batch normalization is used instead of freezing all related parameters. ... Additionally, we insert Dropout (Srivastava et al. 2014) layers in residual blocks with p = 0.02 to avoid some over-fitting.