MOMA: Multi-Object Multi-Actor Activity Parsing

Authors: Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we examine three different tasks using the MOMA dataset. ... For all three experiments, we split 80% of data for training, and hold out the rest for validation. The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph. We demonstrate the significance of action hypergraph representation through these three experiments and the carefully designed baselines.
Researcher Affiliation Academia Zelun Luo , Wanze Xie , Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei Stanford University {alanzluo, wanzexie, siddkap, isaliang, coopermj, jniebles, eadeli, feifeili}@stanford.edu
Pseudocode No The paper includes a structural diagram of the HGAP model (Figure 4) and describes its components, but it does not provide pseudocode or an algorithm block.
Open Source Code No Code, data, and further instructions will be released at https://moma.stanford.edu/.
Open Datasets Yes Lastly, we introduce the MOMA (Multi-Object, Multi-Actor) dataset... Yes, the proposed MOMA dataset will be released at https://moma.stanford.edu/.
Dataset Splits Yes For all three experiments, we split 80% of data for training, and hold out the rest for validation.
Hardware Specification Yes We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph.
Software Dependencies No The paper does not provide specific software names with version numbers for its dependencies (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. ... The input to the video model includes a trimmed video sequence v = {i(1), i(2), . . . , i(N)} of N image frames with sample rate of 8 with N = 16. We pretrain the X3D_L model on Kinetics400 [30]...