Efficient Compact Bilinear Pooling via Kronecker Product
Authors: Tan Yu, Yunfeng Cai, Ping Li3170-3178
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Systematic experiments on four public benchmarks using two backbones demonstrate the efficiency and effectiveness of the proposed method in fine-grained recognition. |
| Researcher Affiliation | Industry | Tan Yu, Yunfeng Cai, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, Washington 98004, USA No.10 Xibeiwang East Road, Beijing 100193, China {tanyu01, caiyunfeng, liping11}@baidu.com |
| Pseudocode | Yes | Algorithm 1: Tensor Modal Product 1: Input: r, X Rd N, b A R a r d r . 2: Output: T = [Ir b A]X. 3: Reshape X into a tensor X RN d r r. 4: Perform modal product T = X 2 b A 3 Ir. 5: Unfold the tensor T along mode-1, and set T = T (1). |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct experiments on four public benchmarks for fine-grained recognition including FGVC-Aircraft (AIR) (Maji et al. 2013), CUB-200-2011 (CUB) (Wah et al. 2011), MIT scene dataset (Quattoni and Torralba 2009), and Describable Texture Dataset (DTD) (Cimpoi et al. 2014). |
| Dataset Splits | No | The paper mentions using 'public benchmarks' but does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts) within the text. |
| Hardware Specification | Yes | The experiments are conducted on a single NVIDIA Titan X (Pascal) GPU card. |
| Software Dependencies | No | The paper states that the method is 'implemented in Paddle Paddle platform' but does not provide specific version numbers for PaddlePaddle or any other software dependencies. |
| Experiment Setup | Yes | We adopt a two-phase training. In the first phase, we only update parameters in TKPF and classifier layers. In the second phase, we fine-tune parameters of all layers. Each image is resized into 448 448... By default, we set a = b = 96, that is, D = 962. We set r = 32 by default when using VGG16 backbone. Considering both effectiveness and efficiency, we set Q = 2 by default. |