Revisiting Bilinear Pooling: A Coding Perspective

Authors: Zhi Gao, Yuwei Wu, Xiaoxun Zhang, Jindou Dai, Yunde Jia, Mehrtash Harandi3954-3961

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two challenging tasks, namely image classification and visual question answering, demonstrate that our method surpasses the bilinear pooling technique by a large margin.
Researcher Affiliation Collaboration Zhi Gao,1 Yuwei Wu,1 Xiaoxun Zhang,2 Jindou Dai,1 Yunde Jia,1 Mehrtash Harandi3 1Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute of Technology, Beijing, China 2Alibaba Group 3Department of Electrical and Computer Systems Eng., Monash University, and Data61, Australia
Pseudocode No The paper describes the algorithm steps in paragraph text and presents a network architecture diagram (Figure 2c), but it does not include a formally labeled pseudocode or algorithm block.
Open Source Code Yes The code is available at https: //github.com/Zhi Gaomcislab/Factorized Bilinear Coding.
Open Datasets Yes Four datasets are used: Describing Texture Dataset (DTD) (Cimpoi et al. 2014), MINC-2500 (MINC) (Bell et al. 2015), MIT-Indoor (Indoor) (Quattoni and Torralba 2009), and Caltech-UCSD Bird (CUB) (Xie et al. 2013)... We use the VQA 1.0 (Agrawal et al. 2017) and VQA 2.0 (Goyal et al. 2017) datasets.
Dataset Splits Yes All models are trained on the training split and evaluated on the validation split.
Hardware Specification No The paper does not specify any particular hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like 'VGG-16 network', 'Res Net-152', 'RNN (LSTM for VQA1.0 and GRU for VQA2.0)', and 'Glove word embedding module' but does not provide specific version numbers for any of them.
Experiment Setup Yes Following the work in (Yu and Salzmann 2018), the size of input images in DTD, Indoor, and CUB is 448 448, and the size of input images in MINC is 224 224. We use the VGG-16 network as the backbone, and layers after the conv5-3 layer are removed. Our FBC module is on the top of the conv5-3 layer... We set the rank of U and V as 1, the number of atoms as 512, 2048, and 8192, and λ as 0.001.