Learning from Weakly-Labeled Web Videos via Exploring Sub-concepts
Authors: Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, Tomas Pfister1341-1349
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of our method on four video action recognition datasets and a weakly-labeled image dataset to study the generalization ability. Experiments show that SPL outperforms several existing pre-training strategies and the learned representations lead to competitive results on several benchmarks. |
| Researcher Affiliation | Collaboration | Kunpeng Li1*, Zizhao Zhang2, Guanhang Wu2, Xuehan Xiong2, Chen-Yu Lee2, Zhichao Lu2, Yun Fu1, Tomas Pfister 2 1 Northeastern University 2 Google Cloud AI |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate the proposed SPL algorithm on both common action recognition as well as fine-grained action recognition datasets. For the common action dataset, we mainly use Kinetics-200 (K200) (Xie et al. 2018) and Soccer Net (Giancola et al. 2018). We also follow recent works (Miech et al. 2020; Stroud et al. 2020) to conduct experiments on popular HMDB-51 (Kuehne et al. 2011) and UCF101 (Soomro, Zamir, and Shah 2012) datasets. We also test SPL on Clothing1M (Xiao et al. 2015). |
| Dataset Splits | Yes | In total, it contains 200 action categories with around 77K videos for training and 5K videos for validation. We use 5547 video clips for training and 5547 clips for validation obtained from different full-match videos. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper mentions "Tensor Flow (Abadi et al. 2015)" but does not specify a version number for TensorFlow or other software dependencies. |
| Experiment Setup | Yes | We use 3D Res Net-50 (Wang et al. 2018) with self-gating (Xie et al. 2018) as the baseline model... At training stage, we use the batch size of 6 and take 16 RGB frames with temporal stride 4 as the input. The spatial size of each frame is 224 224 pixels... For the pre-training on Web K200 sets, we set warm up training for 10 epochs with starting learning rate as 0.04 and then use learning rate 0.4 with cosine decay for 150 epochs. For the fine-tuning, we set warm up training for 10 epochs with starting learning rate as 0.04 and then use learning rate of 0.4 with cosine decay for 60 epochs. For Soccer Net dataset... we use learning rate of 0.005 for 20 epochs. |