Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration

Authors: Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao, Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of AI agents with the human-like agent as well as with real humans using objective metrics and subjective user ratings. Experimental results demonstrate that the proposed challenge and virtual environment enable a systematic evaluation on the important aspects of machine social intelligence at scale.
Researcher Affiliation Collaboration Xavier Puig1 Tianmin Shu1 Shuang Li1 Zilin Wang2 Yuan-Hong Liao3,5 Joshua B. Tenenbaum1 Sanja Fidler3,4,5 Antonio Torralba1 1Massachusetts Institute of Technology 2ETH Zurich 3University of Toronto 4NVIDIA 5Vector Institute
Pseudocode No The paper describes algorithms and their architectures (e.g., Figures 16, 17, 18), but it does not provide pseudocode blocks or sections explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code and documentation for the Virtual Home-Social environment are available at https://virtual-home.org. Code and data for the WAH challenge are available at https://github.com/xavierpuigf/watch_and_help.
Open Datasets Yes Code and data for the WAH challenge are available at https://github.com/xavierpuigf/watch_and_help. We create a training set with 1011 tasks and 2 testing sets (test-1, test-2).
Dataset Splits No The paper mentions 'training set' and 'testing sets' but does not specify a separate 'validation' set or its split details.
Hardware Specification No The paper states 'The environment can be run in a single or multiple processes. A single process runs at 10 actions per second. We train our models using 10 processes in parallel.' but does not specify any hardware details like CPU/GPU models or memory.
Software Dependencies No The paper mentions software components and techniques like 'Transformer', 'LSTM', 'MCTS', 'regression planning', 'word2vec', 'A2C', and 'RMSprop' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used.
Experiment Setup Yes The network is updated by RMSprop (Tieleman & Hinto, 2012) with a learning rate of 0.001 and a batch size of 32. Similar to the low-level policy, we use off-policy A2C for policy optimization, and the network is updated by RMSprop with a learning rate of 0.001 and a batch size of 16.