Temporal Gaussian Mixture Layer for Videos
Authors: Aj Piergiovanni, Michael Ryoo
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments on multiple datasets, including Charades and Multi THUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts1. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Indiana University. Correspondence to: AJ Piergiovanni <ajpiergi@indiana.edu>, Michael Ryoo <mryoo@indiana.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (e.g., a figure or section labeled "Algorithm" or "Pseudocode"). |
| Open Source Code | Yes | 1Code/models: https://github.com/piergiaj/tgm-icml19 |
| Open Datasets | Yes | We conducted our experiments on both THUMOS (Jiang et al., 2014) and Multi THUMOS (Yeung et al., 2015) datasets... Charades (Sigurdsson et al., 2016b) is a large scale dataset... |
| Dataset Splits | Yes | There are 1010 validation videos and 1574 test videos. We used these continuous validation videos for the training of our models. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions using I3D and Inception V3 as base CNNs but no information on the computational resources. |
| Software Dependencies | No | The paper mentions using I3D (Carreira & Zisserman, 2017) and Inception V3 (Szegedy et al., 2016) and their pretraining datasets (Imagenet, Kinetics), but does not provide specific version numbers for these software components or any other libraries/frameworks like PyTorch or TensorFlow. |
| Experiment Setup | Yes | Our default L setting used for the TGM layers as well as the other baselines was as follows: when using I3D segment features (3 features per second from 24fps videos), the 1 layer models used L = 15 and the 3 layer models used L = 5. When using Inception V3 frame feature (at 8 fps), the 1 layer models used L = 30 and the 3 layer models used L = 10. |