Temporal Gaussian Mixture Layer for Videos

Authors: Aj Piergiovanni, Michael Ryoo

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive experiments on multiple datasets, including Charades and Multi THUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts1.
Researcher Affiliation Academia 1Department of Computer Science, Indiana University. Correspondence to: AJ Piergiovanni <ajpiergi@indiana.edu>, Michael Ryoo <mryoo@indiana.edu>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., a figure or section labeled "Algorithm" or "Pseudocode").
Open Source Code Yes 1Code/models: https://github.com/piergiaj/tgm-icml19
Open Datasets Yes We conducted our experiments on both THUMOS (Jiang et al., 2014) and Multi THUMOS (Yeung et al., 2015) datasets... Charades (Sigurdsson et al., 2016b) is a large scale dataset...
Dataset Splits Yes There are 1010 validation videos and 1574 test videos. We used these continuous validation videos for the training of our models.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions using I3D and Inception V3 as base CNNs but no information on the computational resources.
Software Dependencies No The paper mentions using I3D (Carreira & Zisserman, 2017) and Inception V3 (Szegedy et al., 2016) and their pretraining datasets (Imagenet, Kinetics), but does not provide specific version numbers for these software components or any other libraries/frameworks like PyTorch or TensorFlow.
Experiment Setup Yes Our default L setting used for the TGM layers as well as the other baselines was as follows: when using I3D segment features (3 features per second from 24fps videos), the 1 layer models used L = 15 and the 3 layer models used L = 5. When using Inception V3 frame feature (at 8 fps), the 1 layer models used L = 30 and the 3 layer models used L = 10.