Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks

Authors: Iulia Duta, Andrei Nicolicioiu, Marius Leordeanu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive ablation studies and experiments on two challenging datasets, we show superior performance to previous graph neural networks models for video classification.
Researcher Affiliation Collaboration Iulia Duta Bitdefender, Romania id366@cam.ac.uk Andrei Nicolicioiu* Bitdefender, Romania anicolicioiu@bitdefender.com Marius Leordeanu Bitdefender, Romania Institute of Mathematics of the Romanian Academy University "Politehnica" of Bucharest marius.leordeanu@imar.ro
Pseudocode No The paper describes the model using equations and text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code for our method can be found in our repository 2. 2https://github.com/bit-ml/Dy Reg-GNN
Open Datasets Yes We test our model on two video classification datasets that seem to offer the best advantages, being large enough and requiring abilities to model complex interactions. We evaluate on real-world datasets, Something-Something-V1&V2 [74], while we also test on a variant of the Sync MNIST [3] dataset
Dataset Splits Yes Something-Something-V1&V2 [74] datasets classify scenes involving human-object complex interactions. They consist of 86K / 169K training videos and 11K / 25K validation videos, having 174 classes. and The dataset contains 600k training videos and 10k validation videos with 10k validation videos with 10 frames each.
Hardware Specification No The paper states training was done "on two GPUs" or "on a single GPU" but does not specify the exact GPU models, CPU, memory, or any other specific hardware components used for the experiments.
Software Dependencies No The paper does not provide specific version numbers for any ancillary software dependencies (e.g., programming languages, libraries, or frameworks like Python, PyTorch, or TensorFlow).
Experiment Setup Yes In all experiments we follow the training setting of [67], using 16 frames resized to have the shorter side of size 256, and randomly sample a crop of size 224 224. For the evaluations, we follow the setting in [67] of taking 3 spatial crops of size 256 256 with 2 temporal samplings and averaging their results. For training, we use SGD optimizer with learning rate 0.001 and momentum 0.9, using a total batch-size of 10, trained on two GPUs. We decrease the learning rate by a factor of 10 three times when the optimisation reaches a plateau.