Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Authors: Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-theart video models, whilst enjoying the advantage of parameter efficiency. |
| Researcher Affiliation | Collaboration | 1Multimedia Laboratory, The Chinese University of Hong Kong 2Surrey Institute for People-Centred Artificial Intelligence, CVSSP, University of Surrey 3Centre for Perceptual and Interactive Intelligence Limited |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and model are available at https://github.com/linziyi96/st-adapter |
| Open Datasets | Yes | Datasets For the benchmark experiments, we use two popular video action recognition datasets. Kinetics-400 (K400): The K400 [33] dataset contains 240k training videos and 20k validation videos labeled with 400 action categories. Something-Something-v2 (SSv2): The SSv2 [22] dataset consists of 220,487 videos covering 174 human actions. Epic-Kitchens-100 (EK100): The EK100 [13] dataset consists of 100 hours of video in egocentric perspective recording a person interacting with a variety of objects in the kitchen. |
| Dataset Splits | Yes | Kinetics-400 (K400): The K400 [33] dataset contains 240k training videos and 20k validation videos labeled with 400 action categories. |
| Hardware Specification | Yes | 8 V100 GPUs |
| Software Dependencies | No | The paper mentions "Py Torch, Tensor Flow, Tensor RT, and Torch Script" as deep learning toolboxes but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All details, including training and testing settings and module instantiation details, are provided in the appendix. (Section 4.1) We use one ST-Adapter with bottleneck width 384 before MHSA in each Transformer block. (Section 4.3 Ablations) All models are trained using 8 frames and tested with 3 views. (Table 1) |