reproducibilityindex.ai

EAC-Net: Efficient and Accurate Convolutional Network for Video Recognition

Authors: Bowei Jin, Zhuo Xu11149-11156

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on Kinetics, our EAC-Nets achieved better results than TSM models with fewer FLOPs. With same 2D backbones, EAC-Nets outperformed Non-Local I3D counterparts by achieving higher accuracy with only about 7 fewer FLOPs.
Researcher Affiliation	Industry	Bowei Jin, Zhuo Xu i FLYTEK Research, Suzhou, China {bwjin, zhuoxu}@iﬂytek.com
Pseudocode	No	The paper describes the architecture and formulations of its proposed blocks (MGTE and ATE) using diagrams and mathematical equations, but it does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We perform comprehensive studies on the challenging Kinetics dataset (Kay et al. 2017). This dataset contains 246k training videos and 20k validation videos. It is a classiﬁcation task involving 400 human action categories. We train all models on the training set and test on the validation set. Other datasets reported are Something-something V1 (Goyal et al. 2017) which consists of 110k videos of 174 different low-level actions. In all experiments, our models are initialized by Image Net (Russakovsky et al. 2015) pre-trained models.
Dataset Splits	Yes	This dataset contains 246k training videos and 20k validation videos. It is a classiﬁcation task involving 400 human action categories. We train all models on the training set and test on the validation set. During training. We ﬁrst sample 32 frames at random rate from a video, and resize shorter side of each sampled frame to a number prepicked randomly from 215 to 345. Then 224 224 randomly cropping is applied to these processed frames, leading to the network input with dimension of 32 3 224 224. For Kinetics, we train for up to 60 epochs, starting with a learning rate of 0.001 and a 10 reduction of learning rate respectively at 30, 50 epoch.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper describes the model architecture and training procedures but does not specify any software dependencies or their version numbers (e.g., deep learning frameworks like PyTorch/TensorFlow, or Python version).
Experiment Setup	Yes	Implementation Details During training. We ﬁrst sample 32 frames at random rate from a video, and resize shorter side of each sampled frame to a number prepicked randomly from 215 to 345. Then 224 224 randomly cropping is applied to these processed frames, leading to the network input with dimension of 32 3 224 224. In all experiments, our models are initialized by Image Net (Russakovsky et al. 2015) pre-trained models. For Kinetics, we train for up to 60 epochs, starting with a learning rate of 0.001 and a 10 reduction of learning rate respectively at 30, 50 epoch. We use a momentum of 0.9 and a weight decay of 5e-4. We then ﬁne-tuned models pre-trained on Kinetics to Something-Something V1 dataset, where ﬁne-tuning is conducted for 25 total epochs, starting with initial learning rate 0.001 and reduced by a factor of 0.1 respectively at 10, 15, 20 epoch.