Delving Deeper into Convolutional Networks for Learning Video Representations

Authors: Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our approach on both Human Action Recognition and Video Captioning tasks.
Researcher Affiliation Academia 1MILA, Universit e de Montr eal. 2 Ecole Polytechnique de Mont real.
Pseudocode No The paper provides mathematical equations for the GRU and GRU-RCN models but does not present them as structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes We evaluate our approach on the UCF101 dataset Soomro et al. (2012). This dataset has 101 action classes spanning over 13320 You Tube videos clips.
Dataset Splits Yes We report results on the dataset UCF101 first split, as this is most commonly used split in the literature. To perform proper hyperparameter seach, we use the videos from the UCF-Thumos validation split Jiang et al. (2014) as the validation set.
Hardware Specification No The paper mentions 'computing support' from agencies like Compute Canada, but it does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Theano' but does not provide a specific version number. It also mentions optimizers like 'Adam' and 'Adadelta' but not specific library versions used for their implementation.
Experiment Setup Yes At each iteration, a batch of 64 videos are sampled randomly from the the training set. To perform scale-augmentation, we randomly sample the cropping width and height from 256, 224, 192, 168. The temporal cropping size is set to 10. We then resize the cropped volume to 224x224x10. We use Adam Kingma & Ba (2014) with the gradient computed by the backpropagation algorithm. We perform early stopping and choose the parameters that maximize the log-probability of the validation set.