Revisiting Bilinear Pooling: A Coding Perspective
Authors: Zhi Gao, Yuwei Wu, Xiaoxun Zhang, Jindou Dai, Yunde Jia, Mehrtash Harandi3954-3961
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two challenging tasks, namely image classification and visual question answering, demonstrate that our method surpasses the bilinear pooling technique by a large margin. |
| Researcher Affiliation | Collaboration | Zhi Gao,1 Yuwei Wu,1 Xiaoxun Zhang,2 Jindou Dai,1 Yunde Jia,1 Mehrtash Harandi3 1Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute of Technology, Beijing, China 2Alibaba Group 3Department of Electrical and Computer Systems Eng., Monash University, and Data61, Australia |
| Pseudocode | No | The paper describes the algorithm steps in paragraph text and presents a network architecture diagram (Figure 2c), but it does not include a formally labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code is available at https: //github.com/Zhi Gaomcislab/Factorized Bilinear Coding. |
| Open Datasets | Yes | Four datasets are used: Describing Texture Dataset (DTD) (Cimpoi et al. 2014), MINC-2500 (MINC) (Bell et al. 2015), MIT-Indoor (Indoor) (Quattoni and Torralba 2009), and Caltech-UCSD Bird (CUB) (Xie et al. 2013)... We use the VQA 1.0 (Agrawal et al. 2017) and VQA 2.0 (Goyal et al. 2017) datasets. |
| Dataset Splits | Yes | All models are trained on the training split and evaluated on the validation split. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'VGG-16 network', 'Res Net-152', 'RNN (LSTM for VQA1.0 and GRU for VQA2.0)', and 'Glove word embedding module' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Following the work in (Yu and Salzmann 2018), the size of input images in DTD, Indoor, and CUB is 448 448, and the size of input images in MINC is 224 224. We use the VGG-16 network as the backbone, and layers after the conv5-3 layer are removed. Our FBC module is on the top of the conv5-3 layer... We set the rank of U and V as 1, the number of atoms as 512, 2048, and 8192, and λ as 0.001. |