Ripple Attention for Visual Perception with Sub-quadratic Complexity
Authors: Lin Zheng, Huijie Pan, Lingpeng Kong
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method by conducting extensive experiments on image classification and object detection tasks (4). Ripple attention significantly improves the accuracy of the original vision transformer in image classification and performs competitively with detection transformers for object detection (4.3), in asymptotically faster runtime (5.3). Further analysis on the rippling distance and ablation studies (5.1) indicate that ripple attention favors contributions from tokens in the vicinity yet preserves global information from long-term dependencies. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, The University of Hong Kong 2Shanghai Artificial Intelligence Laboratory. |
| Pseudocode | Yes | Algorithm 1 Dynamic Programming for Ripple Attention |
| Open Source Code | No | The paper mentions implementing their model using PyTorch and timm toolkit but does not provide a link or explicit statement about making their own code publicly available. |
| Open Datasets | Yes | Datasets For image classification, we evaluate our model on standard benchmark datasets: (1) Image Net1k dataset (Deng et al., 2009), consisting of approximately 1,280K/50K images of 1000 classes for training/validation splits respectively; (2) CIFAR-100 (Krizhevsky et al., 2009), which contains 50K images of 100 classes for training and 10K for evaluation. For detection tasks, we conduct our experiment on the COCO benchmark (Lin et al., 2014) consisting of 118k training and 5k validation images respectively. |
| Dataset Splits | Yes | Image Net1k dataset (Deng et al., 2009), consisting of approximately 1,280K/50K images of 1000 classes for training/validation splits respectively; (2) CIFAR-100 (Krizhevsky et al., 2009), which contains 50K images of 100 classes for training and 10K for evaluation. For detection tasks, we conduct our experiment on the COCO benchmark (Lin et al., 2014) consisting of 118k training and 5k validation images respectively. |
| Hardware Specification | Yes | All models are tested with a batch size of 4 on a single NVIDIA V100 GPU machine, averaged by 10 runs. ... We use Adam W optimizer (Loshchilov & Hutter, 2019) to train our model on 8 NVIDIA V100 GPUs for 300 epochs... All detection models are trained on 8 NVIDIA V100 GPUs with a total batch size of 16. |
| Software Dependencies | No | We implement our model using Py Torch (Paszke et al., 2019) and Py Torch image models (timm) toolkit (Wightman, 2019). While these are specific tools, they lack explicit version numbers (e.g., PyTorch 1.9, timm 0.5.0) which are required for full reproducibility. |
| Experiment Setup | Yes | We use Adam W optimizer (Loshchilov & Hutter, 2019) to train our model on 8 NVIDIA V100 GPUs for 300 epochs on both CIFAR-100 and Image Net1k datasets. ... For Image Net1k dataset we set the batch size to 1024 and the learning rate to 0.001 with cosine learning rate decay ... The image size is set to 224 224 with patch size 16 ... for CIFAR-100 dataset, the batch size and the learning rate is set to 512 and 0.0005 respectively, with the same cosine learning rate decay. ... The dropout rate is set to 0.1. |