reproducibilityindex.ai

Visual Concept Reasoning Networks

Authors: Taesup Kim, Sungwoong Kim, Yoshua Bengio8172-8180

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on visual recognition tasks such as image classiﬁcation, semantic segmentation, object detection, scene recognition, and action recognition show that our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
Researcher Affiliation	Collaboration	Taesup Kim* 1, 2, Sungwoong Kim 2, Yoshua Bengio 1 1Mila, Universit e de Montr eal 2Kakao Brain *Now at Amazon Web Services (taesup@amazon.com).
Pseudocode	No	The paper describes the architecture and components in detail using mathematical formulations and descriptive text but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	We conduct experiments on a large-scale image classiﬁcation task on the Image Net dataset (Russakovsky et al. 2015). ... We further do some experiments on object detection and instance segmentation on the MSCOCO 2017 dataset (Lin et al. 2014). ... Places365 (Zhou et al. 2017) is a dataset labeled with scene semantic categories for the scene recognition task. ... We use the Kinetics-400 dataset (Kay et al. 2017).
Dataset Splits	Yes	The dataset [ImageNet] consists of 1.28M training images and 50K validation images from 1000 different classes. All networks are trained on the training set and evaluated on the validation set by reporting the top-1 and top-5 errors with single center-cropping. ... MSCOCO dataset contains 115K images over 80 categories for training, 5K for validation. ... Places365-Standard setting that the train set has up to 1.8M images from 365 scene classes, and the validation set has 50 images per each class. ... Kinetics-400 dataset ... including 400 human action categories with 235K training videos and 20K validation videos.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only implies the use of computational resources without specifying them (e.g., 'GFLOPs' are mentioned, which is a measure of computational cost, but not hardware).
Software Dependencies	No	The paper mentions 'Detectron2' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup	Yes	Our training setting is explained in detail in Appendix. ... For evaluation, we always take the ﬁnal model, which is obtained by exponential moving average (EMA) with the decay value 0.9999. ... We employ and train the Mask R-CNN with FPN (He et al. 2017). We follow the training procedure of the Detectron2 and use the 1 schedule setting. Furthermore, synchronized batch normalization is used instead of freezing all related parameters. ... Additionally, we insert Dropout (Srivastava et al. 2014) layers in residual blocks with p = 0.02 to avoid some over-ﬁtting.