Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks
Authors: Iulia Duta, Andrei Nicolicioiu, Marius Leordeanu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive ablation studies and experiments on two challenging datasets, we show superior performance to previous graph neural networks models for video classification. |
| Researcher Affiliation | Collaboration | Iulia Duta Bitdefender, Romania id366@cam.ac.uk Andrei Nicolicioiu* Bitdefender, Romania anicolicioiu@bitdefender.com Marius Leordeanu Bitdefender, Romania Institute of Mathematics of the Romanian Academy University "Politehnica" of Bucharest marius.leordeanu@imar.ro |
| Pseudocode | No | The paper describes the model using equations and text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our method can be found in our repository 2. 2https://github.com/bit-ml/Dy Reg-GNN |
| Open Datasets | Yes | We test our model on two video classification datasets that seem to offer the best advantages, being large enough and requiring abilities to model complex interactions. We evaluate on real-world datasets, Something-Something-V1&V2 [74], while we also test on a variant of the Sync MNIST [3] dataset |
| Dataset Splits | Yes | Something-Something-V1&V2 [74] datasets classify scenes involving human-object complex interactions. They consist of 86K / 169K training videos and 11K / 25K validation videos, having 174 classes. and The dataset contains 600k training videos and 10k validation videos with 10k validation videos with 10 frames each. |
| Hardware Specification | No | The paper states training was done "on two GPUs" or "on a single GPU" but does not specify the exact GPU models, CPU, memory, or any other specific hardware components used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any ancillary software dependencies (e.g., programming languages, libraries, or frameworks like Python, PyTorch, or TensorFlow). |
| Experiment Setup | Yes | In all experiments we follow the training setting of [67], using 16 frames resized to have the shorter side of size 256, and randomly sample a crop of size 224 224. For the evaluations, we follow the setting in [67] of taking 3 spatial crops of size 256 256 with 2 temporal samplings and averaging their results. For training, we use SGD optimizer with learning rate 0.001 and momentum 0.9, using a total batch-size of 10, trained on two GPUs. We decrease the learning rate by a factor of 10 three times when the optimisation reaches a plateau. |