MOMA: Multi-Object Multi-Actor Activity Parsing
Authors: Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we examine three different tasks using the MOMA dataset. ... For all three experiments, we split 80% of data for training, and hold out the rest for validation. The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph. We demonstrate the significance of action hypergraph representation through these three experiments and the carefully designed baselines. |
| Researcher Affiliation | Academia | Zelun Luo , Wanze Xie , Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei Stanford University {alanzluo, wanzexie, siddkap, isaliang, coopermj, jniebles, eadeli, feifeili}@stanford.edu |
| Pseudocode | No | The paper includes a structural diagram of the HGAP model (Figure 4) and describes its components, but it does not provide pseudocode or an algorithm block. |
| Open Source Code | No | Code, data, and further instructions will be released at https://moma.stanford.edu/. |
| Open Datasets | Yes | Lastly, we introduce the MOMA (Multi-Object, Multi-Actor) dataset... Yes, the proposed MOMA dataset will be released at https://moma.stanford.edu/. |
| Dataset Splits | Yes | For all three experiments, we split 80% of data for training, and hold out the rest for validation. |
| Hardware Specification | Yes | We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its dependencies (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. ... The input to the video model includes a trimmed video sequence v = {i(1), i(2), . . . , i(N)} of N image frames with sample rate of 8 with N = 16. We pretrain the X3D_L model on Kinetics400 [30]... |