Efficient Compact Bilinear Pooling via Kronecker Product

Authors: Tan Yu, Yunfeng Cai, Ping Li3170-3178

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Systematic experiments on four public benchmarks using two backbones demonstrate the efficiency and effectiveness of the proposed method in fine-grained recognition.
Researcher Affiliation Industry Tan Yu, Yunfeng Cai, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, Washington 98004, USA No.10 Xibeiwang East Road, Beijing 100193, China {tanyu01, caiyunfeng, liping11}@baidu.com
Pseudocode Yes Algorithm 1: Tensor Modal Product 1: Input: r, X Rd N, b A R a r d r . 2: Output: T = [Ir b A]X. 3: Reshape X into a tensor X RN d r r. 4: Perform modal product T = X 2 b A 3 Ir. 5: Unfold the tensor T along mode-1, and set T = T (1).
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We conduct experiments on four public benchmarks for fine-grained recognition including FGVC-Aircraft (AIR) (Maji et al. 2013), CUB-200-2011 (CUB) (Wah et al. 2011), MIT scene dataset (Quattoni and Torralba 2009), and Describable Texture Dataset (DTD) (Cimpoi et al. 2014).
Dataset Splits No The paper mentions using 'public benchmarks' but does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification Yes The experiments are conducted on a single NVIDIA Titan X (Pascal) GPU card.
Software Dependencies No The paper states that the method is 'implemented in Paddle Paddle platform' but does not provide specific version numbers for PaddlePaddle or any other software dependencies.
Experiment Setup Yes We adopt a two-phase training. In the first phase, we only update parameters in TKPF and classifier layers. In the second phase, we fine-tune parameters of all layers. Each image is resized into 448 448... By default, we set a = b = 96, that is, D = 962. We set r = 32 by default when using VGG16 backbone. Considering both effectiveness and efficiency, we set Q = 2 by default.