EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

Authors: Sunil Hwang, Jaehong Yoon, Youngwan Lee, Sung Ju Hwang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments. We extensively validate our proposed method on multiple benchmark datasets, including UCF101, HMDB51, K400, Something-Something V2, and Ego4D, and our EVEREST shows remarkable efficiency in terms of memory occupancy, computational cost, and training time compared to strong counterparts, achieving competitive performance.
Researcher Affiliation Collaboration 1Korea Military Academy 2UNC Chapel Hill 3KAIST 4ETRI 5Deep Auto.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper refers to the Video MAE repository as the base for their implementation but does not explicitly state that the code for EVEREST (their proposed method) is open-source or provide a link to it.
Open Datasets Yes We extensively validate our proposed method on multiple benchmark datasets, including UCF101 (Soomro et al., 2012), HMDB51 (Kuehne et al., 2011), Something-Something v2 (SSv2) (Goyal et al., 2017), Kinetics-400 (K400) (Kay et al., 2017) and Ego4D (Grauman et al., 2022).
Dataset Splits Yes The OSCC dataset is the subset of the Ego4d dataset, consisting of 41.1k/21.2k train/val 8-second videos.
Hardware Specification Yes enabling the pre-training and fine-tuning on a single machine with 8 GPUs; using one node equipped with 8 A100 (80GB) GPUs; a single-node machine equipped with 8 A6000 (48GB) GPUs; 4 NVIDIA RTX 3090 GPUs are used.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Cosine decay' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Table 7: Pre-training settings for K400, SSv2, UCF101, HMDB51 and OSCC. and Table 8: Fine-tuning settings for K400, SSv2, UCF101, HMDB51 and OSCC. These tables specify detailed hyperparameters like optimizer, learning rate, batch size, warmup epochs, masking ratios, and augmentation strategies.