Differentiable Grammars for Videos

Authors: AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo11874-11881

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental It outperforms the state-of-the-art on several challenging datasets and is more accurate for forecasting future activities in videos.
Researcher Affiliation Industry AJ Piergiovanni, Anelia Angelova, Michael S. Ryoo Robotics at Google {ajpiergi, anelia, mryoo}@google.com
Pseudocode Yes Algorithm 1 The training of the grammar, with multiple branches
Open Source Code No We plan to open-source the code.1
Open Datasets Yes MLB-You Tube (Piergiovanni and Ryoo 2018a), Charades (Sigurdsson et al. 2016b), and Multi THUMOS (Yeung et al. 2015). We also compare on 50 Salads (Stein and Mc Kenna 2013).
Dataset Splits No The paper uses standard datasets but does not explicitly provide specific train/validation/test dataset splits (percentages or counts) or refer to a standard, predefined splitting methodology for these experiments.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or specific cloud instances used for running experiments.
Software Dependencies No We implemented our models in Py Torch.
Experiment Setup Yes The learning rate was set to 0.1, decayed every 50 epochs by 10, and the models were trained for 400 epochs. We pruned the number of branches to 2048 by random selection. The number of grammar parameters vary by dataset driven by the number of classes, MLB has 8 terminals (for 8 classes), 5 rules per non-terminal, 8 non-terminals. Charades 157 terminals, 10 rules per non-terminal, 1000 nonterminals. The LSTM has 1000 hidden units for all.