Focal Modulation Networks
Authors: Jianwei Yang, Chunyuan Li, Xiyang Dai, Jianfeng Gao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show Focal Nets outperform the state-of-the-art SA counterparts (e.g., Swin and Focal Transformers) with similar computational cost on the tasks of image classification, object detection, and semantic segmentation. |
| Researcher Affiliation | Industry | Jianwei Yang, Chunyuan Li, Xiyang Dai, Jianfeng Gao {jianwyan,chunyl,xidai,jfgao}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Pseudo code for Focal Modulation. |
| Open Source Code | Yes | Code is available at: https://github.com/microsoft/Focal Net. |
| Open Datasets | Yes | We compare different methods on Image Net-1K classification [16]. Overall, we train Focal Net-T, Focal Net-S and Focal Net-B with Image Net-1K training set... When pretrained on Image Net-22K... We make comparisons on object detection with COCO 2017 [42]. We use ADE20K [95] for our experiments |
| Dataset Splits | Yes | report Top-1 accuracy (%) on the validation set. ...evaluated on 5K validation images. ...ADE20K validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch-style pseudo code' and 'Adam W' as optimizer but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For training, we use Adam W [48] as the optimizer with initial learning rate 10 4 and weight decay 0.05. All models are trained with batch size 16. We set the stochastic drop rates to 0.1, 0.2, 0.3 in 1 and 0.3, 0.5, 0.5 in 3 training schedule for Focal Net-T/S/B, respectively. |