Learning to Specialize with Knowledge Distillation for Visual Question Answering
Authors: Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results indeed demonstrate that our method outperforms other baselines for VQA and image classification. |
| Researcher Affiliation | Academia | 1Computer Vision Lab., POSTECH, Pohang, Korea 2Algorithmic Intelligence Lab., KAIST, Daejeon, Korea 3Computer Vision Lab., ASRI, Seoul National University, Seoul, Korea |
| Pseudocode | No | The paper describes the training procedure in text (e.g., "Training procedure of MCL-KD is as follows...") but does not present it as structured pseudocode or a labeled algorithm block. |
| Open Source Code | No | The paper refers to publicly available implementations of baseline models (bottom-up and top-down attention model and CMCL) but does not state that the code for the proposed MCL-KD method is released or provide a link to it. |
| Open Datasets | Yes | We employ CLEVR and VQA v2.0 datasets to validate our algorithm. CLEVR [14] is constructed for an analysis of various aspects of visual reasoning... VQA v2.0 [9] is a very popular dataset based on images collected from MSCOCO [24]. |
| Dataset Splits | Yes | CLEVR [14]... is composed of 70,000 training images with 699,989 questions and 15,000 validation images with 149,991 questions... VQA v2.0 [9]... consists of 443,757 and 214,354 questions for train and validation, respectively |
| Hardware Specification | No | The paper mentions training models and memory limitations (batch size change from 512 to 256) but does not specify any details about the hardware used, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions using ADAM optimizer and ResNet-101, and refers to external implementations for baselines, but does not specify version numbers for any software dependencies like deep learning frameworks or programming languages. |
| Experiment Setup | Yes | All models are optimized using ADAM [17] with fixed learning rate of 0.0005 and batch size of 64 while the parameters of Res Net-101 are fixed. We set β and T in Eq. 4 to 50 and 0.1, respectively, based on our empirical observations. |