PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception

Authors: Aviv Netanyahu, Tianmin Shu, Boris Katz, Andrei Barbu, Joshua B. Tenenbaum845-853

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feedforward neural networks. We conduct two human experiments to evaluate the quality of this dataset. We propose two machine social perception tasks on this dataset. We test state-of-the-art methods based on feed-forward neural networks and show that they fail to understand or predict many of these social interactions. Table 1 and Table 2 summarize the performance of all methods in the two tasks.
Researcher Affiliation Academia Aviv Netanyahu*, Tianmin Shu*, Boris Katz, Andrei Barbu, Joshua B. Tenenbaum Massachusetts Institute of Technology, Cambridge, MA 02139 {avivn, tshu, boris, abarbu, jbt}@mit.edu
Pseudocode No The paper describes the Hierarchical Planner and Bayesian Inverse Planning in prose and through equations, but does not include structured pseudocode blocks or formally labeled algorithm sections.
Open Source Code No The paper states: 'The dataset and the supplementary material are available at https: //www.tshu.io/PHASE.' While supplementary material often includes code, the statement does not unambiguously confirm the availability of the *source code for the methodology* (e.g., SIMPLE or the simulation engine) as a direct release.
Open Datasets Yes In this work, we create a dataset of physically-grounded abstract social events, PHASE. The dataset and the supplementary material are available at https: //www.tshu.io/PHASE.
Dataset Splits Yes PHASE consists of 500 video animations depicting diverse social interactions. With these 500 videos, we create a training set of 320 videos, a validation set of 80 videos, and a testing set of 100 videos.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies No The paper mentions frameworks and algorithms like Dec-POMDP, A*, POMCP, 2-Layer MLP, 2-Level LSTM, ARG, Social-LSTM, and STGAT. However, it does not provide specific version numbers for any software libraries, tools, or environments used to implement these or conduct the experiments.
Experiment Setup Yes For the first task, joint goal and relation inference, we compare our model, SIMPLE (with 15 particles and 6 iterations)... We sample a time interval with a fixed length, T, based on the errors between the simulation and the observations, i.e., tl,m e η Ptl,m+ T τ=tl,m ||ˆsτ l,m sτ ||2, where η = 0.1.