Alignment-guided Temporal Attention for Video Action Recognition
Authors: Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple benchmarks demonstrate the superiority and generality of our module. ... 4 Experiments 4.1 Experimental Setting Benchmarks. We employ two widely used video action recognition datasets, i.e., Kinetics-400 (K400) [22] and Something-Something V2 (SSv2) [14], in our experiments. ... 4.2 Comparison Results ... 4.3 Ablation study |
| Researcher Affiliation | Collaboration | Yizhou Zhao1 Zhenyang Li2 Xun Guo3 Yan Lu3 1Carnegie Mellon University 2Tsinghua University 3Microsoft Research Asia |
| Pseudocode | No | The paper describes its methods using mathematical equations and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We employ two widely used video action recognition datasets, i.e., Kinetics-400 (K400) [22] and Something-Something V2 (SSv2) [14], in our experiments. |
| Dataset Splits | Yes | Kinetics-400 contains 240k training videos and 30k validation videos in 400 classes of human actions. Something-Something V2 consists of 168.9K training videos and 24.7K validation videos for 174 classes. We provide the top-1 and top-5 accuracy on the validation sets, the inference complexity measured with FLOPs, and the model capacity in terms of the number of parameters. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for its experiments. The self-evaluation checklist also confirms 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'. |
| Software Dependencies | No | The paper mentions models and optimizers (e.g., 'Time Sformer', 'SGD') but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Implementation details. We use Time Sformer [3] with the officially released model pretrained on Image Net-21K [23] as our baseline. ... We adopt SGD to optimize our network for 30 epochs with a mini-batch size of 64. The initial learning rate is set to 0.005 with 0.1 decays on the 21st and 27th epochs. All patch embeddings are applied with a weight decay of 1e 4, while the class tokens and the positional embeddings used no weight decay. ... The resolution of 224 224 is used throughout all the experiments. |