Collaborating with Humans without Human Data

Authors: DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, Richard Everett

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
Researcher Affiliation Industry Deep Mind {strouse, kevinrmckee, botvinick, edwardhughes, reverett}@deepmind.com
Pseudocode No The paper describes the methods in text and uses diagrams (e.g., Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement that the authors are releasing the code for their described methodology, nor does it provide a direct link to such a repository.
Open Datasets No The paper mentions collecting '5 human-human trajectories of length 1200 time steps for each of the 5 layouts, resulting in 60k total environment steps' for training BC agents, but does not provide concrete access information (link, DOI, repository, or citation with author/year for public access) for this collected data. While they use the Overcooked environment, the question is specifically about the dataset used for training.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (percentages or counts) or cross-validation details for the main models. It mentions splitting collected human data 'in half' for two BC agents, but this is not a general train/validation/test split for the primary models.
Hardware Specification No The paper mentions using a 'distributed set of environments running in parallel' but does not provide specific details on the hardware used, such as CPU or GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions using the V-MPO algorithm, ResNet, LSTM, and Acme, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Both PP and FCP are trained with a population size of N = 32 agents which are sampled uniformly. For FCP, we use 3 checkpoints for each agent...When varying architecture for the training partners of the FCP+A and FCP T,+A variants, we vary whether the partners use memory (i.e. LSTM vs not) and the width of their policy and value networks (i.e. 16 vs 256).