Collaborating with Humans without Human Data
Authors: DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, Richard Everett
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. |
| Researcher Affiliation | Industry | Deep Mind {strouse, kevinrmckee, botvinick, edwardhughes, reverett}@deepmind.com |
| Pseudocode | No | The paper describes the methods in text and uses diagrams (e.g., Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement that the authors are releasing the code for their described methodology, nor does it provide a direct link to such a repository. |
| Open Datasets | No | The paper mentions collecting '5 human-human trajectories of length 1200 time steps for each of the 5 layouts, resulting in 60k total environment steps' for training BC agents, but does not provide concrete access information (link, DOI, repository, or citation with author/year for public access) for this collected data. While they use the Overcooked environment, the question is specifically about the dataset used for training. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (percentages or counts) or cross-validation details for the main models. It mentions splitting collected human data 'in half' for two BC agents, but this is not a general train/validation/test split for the primary models. |
| Hardware Specification | No | The paper mentions using a 'distributed set of environments running in parallel' but does not provide specific details on the hardware used, such as CPU or GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using the V-MPO algorithm, ResNet, LSTM, and Acme, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Both PP and FCP are trained with a population size of N = 32 agents which are sampled uniformly. For FCP, we use 3 checkpoints for each agent...When varying architecture for the training partners of the FCP+A and FCP T,+A variants, we vary whether the partners use memory (i.e. LSTM vs not) and the width of their policy and value networks (i.e. 16 vs 256). |