Space-time Mixing Attention for Video Transformer
Authors: Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our model produces very high recognition accuracy on the most popular video recognition datasets while at the same time being significantly more efficient than other Video Transformer models. and 4.1 Experimental setup |
| Researcher Affiliation | Collaboration | Adrian Bulat Samsung AI Cambridge adrian@adrianbulat.com Juan-Manuel Perez-Rua Samsung AI Cambridge j.perez-rua@samsung.com Swathikiran Sudhakaran Samsung AI Cambridge swathikir.s@samsung.com Brais Martinez Samsung AI Cambridge brais.a@samsung.com Georgios Tzimiropoulos Samsung AI Cambridge Queen Mary University of London g.tzimiropoulos@qmul.ac.uk |
| Pseudocode | No | The paper describes the methods in narrative text and uses figures to illustrate concepts, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Code for our method is made available here. and We will release code and models to facilitate this. and in the checklist: [No] We include however all implementation details required to reproduce our work. We will make the code and the models available. |
| Open Datasets | Yes | Datasets: We train and evaluate the proposed models on the following datasets (all datasets are publicly available for research purposes): Kinetics-400 and 600: The Kinetics [21] dataset... Something-Something-v2 (SSv2): The SSv2 [17] dataset... Epic Kitchens-100 (Epic-100): is an egocentric large scale action recognition dataset... |
| Dataset Splits | No | The paper mentions using well-known datasets like Kinetics, Something-Something-v2, and Epic Kitchens-100, which often have standard splits. However, it does not explicitly state the train/validation/test split percentages or sample counts within the paper, nor does it cite a specific paper for the exact splits used for these datasets. |
| Hardware Specification | Yes | The models were trained on 8 V100 GPUs using Py Torch [30]. |
| Software Dependencies | No | The paper mentions 'Py Torch [30]' but does not specify a version number for it or any other software dependency. |
| Experiment Setup | Yes | specifically, our models were trained using SGD with momentum (0.9) and a cosine scheduler [28] (with linear warmup) for 35 epochs on SSv2, 50 on Epic-100 and 30 on Kinetics. The base learning rate, set at a batch size of 128, was 0.05 (0.03 for Kinetics). To prevent over-fitting we made use of the following augmentation techniques: random scaling (0.9 to 1.3 ) and cropping, random flipping (with probability of 0.5; not for SSv2) and autoaugment [8]. In addition, for SSv2 and Epic-100, we also applied random erasing (probability=0.5, min. area=0.02, max. area=1/3, min. aspect=0.3) [52] and label smoothing (λ = 0.3) [34] while, for Kinetics, we used mixup [51] (α = 0.4). |