Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MOMA: Multi-Object Multi-Actor Activity Parsing
Authors: Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we examine three different tasks using the MOMA dataset. ... For all three experiments, we split 80% of data for training, and hold out the rest for validation. The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph. We demonstrate the significance of action hypergraph representation through these three experiments and the carefully designed baselines. |
| Researcher Affiliation | Academia | Zelun Luo , Wanze Xie , Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei Stanford University EMAIL |
| Pseudocode | No | The paper includes a structural diagram of the HGAP model (Figure 4) and describes its components, but it does not provide pseudocode or an algorithm block. |
| Open Source Code | No | Code, data, and further instructions will be released at https://moma.stanford.edu/. |
| Open Datasets | Yes | Lastly, we introduce the MOMA (Multi-Object, Multi-Actor) dataset... Yes, the proposed MOMA dataset will be released at https://moma.stanford.edu/. |
| Dataset Splits | Yes | For all three experiments, we split 80% of data for training, and hold out the rest for validation. |
| Hardware Specification | Yes | We use an 8-GPU (Tesla V100) environment for training video feature extractor and fine-tuning on the action hypergraph. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its dependencies (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | The major hyperparameter we focus on is the loss weight assigned for each output head, while we also tune on model hidden size and initial learning rate. ... The input to the video model includes a trimmed video sequence v = {i(1), i(2), . . . , i(N)} of N image frames with sample rate of 8 with N = 16. We pretrain the X3D_L model on Kinetics400 [30]... |