Region-Based Global Reasoning Networks
Authors: Chuanming Wang, Huiyuan Fu, Charles X. Ling, Peilun Du, Huadong Ma12136-12143
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our approach, we apply Re Gr to fine-grained classification and action recognition benchmark tasks, and the experimental results demonstrate the effectiveness of our approach. |
| Researcher Affiliation | Academia | 1Beijing University of Posts and Telecommunications, China 2Western University, Canada {wcm, fhy, dupeilun1995, mhd}@bupt.edu.cn, charles.ling@uwo.ca |
| Pseudocode | No | The paper presents figures and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | For fine-grained classification task, we adopt the Birds-200-2011 (CUB) dataset (Welinder et al. 2010) as the benchmark dataset. For action recognition task, we evaluate our approach on the UCF101 (Soomro, Zamir, and Shah 2012) and Kinetics (Carreira and Zisserman 2017). |
| Dataset Splits | Yes | We report the results of UCF101 according to its official split1 file. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions software components like SGD optimizer and Batch Norm but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Training. We use the models pretrained on Image Net to initialize the weights and set the weight of BN layer in our module to zero. A dropout layer with ratio 0.5 is inserted after the pooling layer to avoid overfitting. For action recognition task, we randomly crop out 64 consecutive frames from the full-length video and then extract 8 frames with random interval. We resize the shorter side randomly in [256,320] and crop random 224 224 pixels as input. For fine-grained classification task, we crop a patch which is random in [0.08 ,1.25] of the original input, then we resize the patch to 448 448. We adopt SGD as the optimizer with a weight decay of 0.0001 and momentum of 0.9. The strategy of gradual warmup is used during training. We train our models for 100 epochs in total, starting with a learning rate of 0.01 and reducing it by a factor of 10 at 30th, 60th and 80th epochs, receptively. |