reproducibilityindex.ai

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Authors: Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Something V1&V2, Jester and Mini-Kinetics show that our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
Researcher Affiliation	Collaboration	1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab 3IBM Research 4Microsoft 5Boston University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The project page can be found at https://mengyuest.github.io/Ada Fuse/
Open Datasets	Yes	We evaluate Ada Fuse on Something-Something V1 (Goyal et al., 2017) & V2 (Mahdisoltani et al., 2018), Jester (Materzynska et al., 2019) and a subset of Kinetics (Kay et al., 2017).
Dataset Splits	Yes	Jester (Materzynska et al., 2019) has 27 annotated classes for hand gestures, with 119k / 15k videos in training / validation set. Mini-Kinetics (assembled by Meng et al. (2020)) is a subset of full Kinetics dataset (Kay et al., 2017) containing 121k videos for training and 10k videos for testing across 200 action classes.
Hardware Specification	Yes	where each experiment takes 12 24 hours on 4 Tesla V100 GPUs.
Software Dependencies	No	The paper mentions general software like "back-propagation" and "Gumbel Softmax Estimator" but does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We uniformly sample T = 8 frames from each video. The input dimension for the network is 224 224. Random scaling and cropping are used as data augmentation during training (and we further adopt random ﬂipping for Mini-Kinetics). Center cropping is used during inference. All our networks are using Image Net pretrained weights. We follow a step-wise learning rate scheduler with the initial learning rate as 0.002 and decay by 0.1 at epochs 20 & 40. To train our adaptive temporal fusion approach, we set the efﬁciency term λ = 0.1. We train all the models for 50 epochs with a batch-size of 64