Symmetric Machine Theory of Mind
Authors: Melanie Sclar, Graham Neubig, Yonatan Bisk
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that reinforcement learning agents that model the mental states of others achieve significant performance improvements over agents with no such theory of mind model. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Language Technologies Institute, Carnegie Mellon University. Correspondence to: <msclar@cs.washington.edu, {gneubig,ybisk}@cs.cmu.edu>. |
| Pseudocode | Yes | A pseudocode of MADDPG-EE s implementation can be found in Section A.4. Algorithm 1 Actor implementation of MADDPG-EE |
| Open Source Code | Yes | Code can be found at https: //github.com/msclar/symmtom. |
| Open Datasets | No | The paper introduces a new simulated environment called 'Symm To M' for its experiments instead of using a pre-existing, publicly available dataset with specific access information. |
| Dataset Splits | No | The paper mentions training for '60000 episodes' and evaluating for '1000 episodes' but does not specify any validation dataset splits or percentages. |
| Hardware Specification | Yes | Experiments were run on a server with 256GB RAM, 2 18-core Intel E5-2699 processors @ 2.3GHz. |
| Software Dependencies | No | The paper mentions using MADDPG and its variants (RMADDPG, MADDPG-CE, etc.) as frameworks, but it does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train through 60000 episodes, and with 9 random seeds to account for high variances. Our policies are parametrized by a two-layer Re LU MLP with 64 units per layer...We used the same hyperparameters as the ones used in MADDPG, except with a reduced learning rate and tau (lr = 0.001 and τ = 0.005). We set the length of each episode to 5w |