Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Authors: Yixuan Pei, Zhiwu Qing, Jun CEN, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on multiple challenging benchmarks, i.e., HMDB51, UCF101 and Something Something V2, demonstrate that Frame Maker can achieve better performance to recent advanced methods while consuming only 20% memory. Additionally, under the same memory consumption conditions, Frame Maker significantly outperforms existing state-of-the-arts by a convincing margin.
Researcher Affiliation Collaboration Xi an Jiaotong University1 Huazhong University of Science and Technology2 The Hong Kong University of Science and Technology3 Alibaba Group4
Pseudocode No The paper includes a figure (Figure 2) that illustrates the Frame Maker framework, but it does not contain any formal pseudocode blocks or algorithms.
Open Source Code No The checklist in the paper states: "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]"
Open Datasets Yes The proposed Frame Maker is evaluated on three standard action recognition datasets, UCF101 [49], HMDB51 [28] and Something-Something V2 [18].
Dataset Splits Yes For UCF101, the model is trained on 51 classes first, and the remaining 50 classes are divided into 5, 10 and 25 tasks. For HMDB51, we train the base model using videos from 26 classes, and the remaining 25 classes are separated into 5 or 25 groups. For Something-Something V2, we first train 84 classes in the initial stage, and generate the groups of 10 and 5 classes.
Hardware Specification Yes We train all models on eight NVIDIA V100 GPUs and use Py Torch [42] for all our experiments.
Software Dependencies No The paper mentions "Py Torch [42]" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes For UCF101, we train a Res Net-34 TSM for 50 epochs with a batch size 256 from an initial learning rate 0.04. For HMDB51 and Something-Something V2, we train a Res Net-50 TSM for 50 epochs with a batch size of 128 from an initial learning rate of 1e-3 and 0.04, respectively. All used networks are first pre-trained on Image Net [8] for initialization. These settings are consistent with TCD [41].