reproducibilityindex.ai

Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration

Authors: Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao, Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of AI agents with the human-like agent as well as with real humans using objective metrics and subjective user ratings. Experimental results demonstrate that the proposed challenge and virtual environment enable a systematic evaluation on the important aspects of machine social intelligence at scale.
Researcher Affiliation	Collaboration	Xavier Puig1 Tianmin Shu1 Shuang Li1 Zilin Wang2 Yuan-Hong Liao3,5 Joshua B. Tenenbaum1 Sanja Fidler3,4,5 Antonio Torralba1 1Massachusetts Institute of Technology 2ETH Zurich 3University of Toronto 4NVIDIA 5Vector Institute
Pseudocode	No	The paper describes algorithms and their architectures (e.g., Figures 16, 17, 18), but it does not provide pseudocode blocks or sections explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code and documentation for the Virtual Home-Social environment are available at https://virtual-home.org. Code and data for the WAH challenge are available at https://github.com/xavierpuigf/watch_and_help.
Open Datasets	Yes	Code and data for the WAH challenge are available at https://github.com/xavierpuigf/watch_and_help. We create a training set with 1011 tasks and 2 testing sets (test-1, test-2).
Dataset Splits	No	The paper mentions 'training set' and 'testing sets' but does not specify a separate 'validation' set or its split details.
Hardware Specification	No	The paper states 'The environment can be run in a single or multiple processes. A single process runs at 10 actions per second. We train our models using 10 processes in parallel.' but does not specify any hardware details like CPU/GPU models or memory.
Software Dependencies	No	The paper mentions software components and techniques like 'Transformer', 'LSTM', 'MCTS', 'regression planning', 'word2vec', 'A2C', and 'RMSprop' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used.
Experiment Setup	Yes	The network is updated by RMSprop (Tieleman & Hinto, 2012) with a learning rate of 0.001 and a batch size of 32. Similar to the low-level policy, we use off-policy A2C for policy optimization, and the network is updated by RMSprop with a learning rate of 0.001 and a batch size of 16.