Learning to Specialize with Knowledge Distillation for Visual Question Answering

Authors: Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results indeed demonstrate that our method outperforms other baselines for VQA and image classification.
Researcher Affiliation Academia 1Computer Vision Lab., POSTECH, Pohang, Korea 2Algorithmic Intelligence Lab., KAIST, Daejeon, Korea 3Computer Vision Lab., ASRI, Seoul National University, Seoul, Korea
Pseudocode No The paper describes the training procedure in text (e.g., "Training procedure of MCL-KD is as follows...") but does not present it as structured pseudocode or a labeled algorithm block.
Open Source Code No The paper refers to publicly available implementations of baseline models (bottom-up and top-down attention model and CMCL) but does not state that the code for the proposed MCL-KD method is released or provide a link to it.
Open Datasets Yes We employ CLEVR and VQA v2.0 datasets to validate our algorithm. CLEVR [14] is constructed for an analysis of various aspects of visual reasoning... VQA v2.0 [9] is a very popular dataset based on images collected from MSCOCO [24].
Dataset Splits Yes CLEVR [14]... is composed of 70,000 training images with 699,989 questions and 15,000 validation images with 149,991 questions... VQA v2.0 [9]... consists of 443,757 and 214,354 questions for train and validation, respectively
Hardware Specification No The paper mentions training models and memory limitations (batch size change from 512 to 256) but does not specify any details about the hardware used, such as CPU or GPU models.
Software Dependencies No The paper mentions using ADAM optimizer and ResNet-101, and refers to external implementations for baselines, but does not specify version numbers for any software dependencies like deep learning frameworks or programming languages.
Experiment Setup Yes All models are optimized using ADAM [17] with fixed learning rate of 0.0005 and batch size of 64 while the parameters of Res Net-101 are fixed. We set β and T in Eq. 4 to 50 and 0.1, respectively, based on our empirical observations.